🧠 AI🔴 BearishImportance 7/10Actionable

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

arXiv – CS AI|Jiejun Tan, Zhicheng Dou, Xinyu Yang, Yuyang Hu, Yiruo Cheng, Xiaoxi Li, Ji-Rong Wen|June 1, 2026 at 04:00 AM

🤖AI Summary

Researchers reveal a critical vulnerability in LLM agents operating in local workspaces, where attackers can plant hidden prompt injections across multiple steps to gain persistent control. The new ClawTrojan benchmark demonstrates 95.5% attack success rates against GPT-5.4, while a proposed defense mechanism called DASGuard offers runtime protection by tracing and sanitizing potentially malicious control text in sensitive files.

Analysis

The evolution of large language models from conversational interfaces to autonomous agents capable of file operations and tool integration creates a fundamentally different threat landscape than traditional AI safety concerns. This research exposes a sophisticated attack vector that exploits the temporal and spatial separation between malicious input injection and execution, allowing attackers to bypass defenses designed for single-turn interactions. Rather than deploying an immediately harmful prompt, adversaries can embed trojan instructions within files or tool outputs that agents later consume and execute, making detection significantly harder.

This vulnerability represents a natural escalation in AI security research as agents gain real-world operational capabilities. Previous defenses focused on blocking obviously malicious requests, but this multi-step attack paradigm operates in a gray zone where individual actions appear benign in isolation. An agent reading a file with embedded instructions or storing data for later sessions represents legitimate functionality, yet collectively these operations enable persistent backdoor access.

For developers and organizations deploying agentic systems in production environments, this research carries immediate practical implications. The high success rate against current models suggests that many existing deployments may lack adequate protections against this attack class. DASGuard's approach of tracing control content origin and implementing runtime sanitization provides a defensible strategy, though it requires careful implementation to avoid false positives that would limit agent functionality.

The research underscores that agentic AI systems require fundamentally different security architectures than traditional software, where trust boundaries and state management across sessions demand continuous re-evaluation. Organizations must implement provenance tracking for all inputs and develop comprehensive testing protocols that evaluate multi-step attack sequences rather than isolated prompts.

Key Takeaways

→Multi-step trojan attacks in LLM agents achieve 95.5% success rates by exploiting temporal separation between malicious injection and execution.
→Existing single-step prompt injection defenses fail to detect earlier write operations that plant persistent backdoors in local workspaces.
→DASGuard mitigates this threat by scanning files for control-like text, tracing its origin, and removing content from untrusted sources.
→Agentic AI systems require fundamentally different security architectures than traditional chatbots due to persistent state and file system access.
→Organizations deploying autonomous agents must implement provenance tracking and multi-step attack evaluation in their security testing protocols.

Mentioned in AI

Models

GPT-5OpenAI

#llm-security #prompt-injection #trojan-attacks #ai-agents #cybersecurity #ai-defense #local-execution #backdoor-attacks

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge