ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection
Researchers introduce ClawGuard, a runtime security framework that protects tool-augmented LLM agents from indirect prompt injection attacks by enforcing user-confirmed rules at tool-call boundaries. The framework blocks malicious instructions embedded in tool responses without requiring model modifications, demonstrating robust protection across multiple state-of-the-art language models.
ClawGuard addresses a critical vulnerability in AI agent systems where adversaries inject malicious instructions through tool-returned content. As LLM agents increasingly integrate external tools for real-world tasks, this attack vector poses significant operational risk. The framework operates at the tool-call boundary, the natural enforcement point where external data enters the agent's decision-making pipeline, enabling deterministic security without relying on model alignment—a shift from reactive safety measures to proactive access control.
The vulnerability spans three primary channels: web/local content injection where adversaries embed instructions in scraped data, MCP server injection targeting standardized tool protocols, and skill file injection through manipulated external resources. ClawGuard's approach automatically derives task-specific access constraints from user objectives before any tool invocation occurs, creating a rule set that agents must follow regardless of downstream instructions. This deterministic mechanism transforms security from an alignment problem—where model behavior remains probabilistic—into an auditable, boundary-enforced policy.
For the AI infrastructure ecosystem, this framework matters significantly. Organizations deploying autonomous agents face liability for agent actions, making runtime guarantees essential. ClawGuard requires no model retraining, infrastructure modifications, or safety-specific fine-tuning, enabling rapid deployment across existing systems. The public code release democratizes access to this protection mechanism.
Looking ahead, successful boundary enforcement could reshape how organizations build agentic systems. The framework's effectiveness across five models and multiple benchmarks suggests applicability to emerging agent architectures. Key questions include adoption rates among agent framework developers and whether similar boundary-enforcement patterns become standard practice for tool-augmented AI systems.
- →ClawGuard protects LLM agents from indirect prompt injection by enforcing rules at tool-call boundaries without model modification.
- →The framework blocks three attack channels: web/local content injection, MCP server injection, and skill file injection simultaneously.
- →Runtime security operates deterministically, requiring no safety-specific fine-tuning or architectural changes to existing systems.
- →Experiments show robust protection across five state-of-the-art models on AgentDojo, SkillInject, and MCPSafeBench benchmarks.
- →Public code availability enables rapid adoption and integration into existing LLM agent frameworks and infrastructure.