🧠 AI⚪ NeutralImportance 6/10

RecurGuard: Runtime Monitoring for Reasoning-Token Consumption Attacks

arXiv – CS AI|Abid Aziz, Hafsa Binte Kibria|June 9, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce RecurGuard, a runtime monitoring system that defends reasoning-capable large language models against prompt injection attacks designed to exhaust generation budgets on decoy tasks. The defense detects 99% of such attacks while maintaining minimal false positives, though adaptive adversaries can partially evade detection by using topical rather than semantic attacks.

Analysis

RecurGuard addresses a critical vulnerability emerging in advanced reasoning models like Qwen that expose their internal reasoning traces. Attackers exploit these models by injecting prompts that redirect computational resources toward meaningless tasks, creating two failure modes: denial of service when no answer emerges, and denial of wallet when users face inflated token bills. This attack surface exists precisely because reasoning models generate intermediate steps visible to monitoring systems, creating an asymmetry where input-side classifiers cannot detect syntactically benign injected tasks.

The broader context reflects the growing complexity of LLM security as models become more capable and expensive to operate. As reasoning tokens become a billable commodity and inference costs scale with model sophistication, denial-of-wallet attacks represent a direct economic threat to service providers and users alike. The paper's finding that 99% of OverThink attacks and 92% of ExtendAttack variants are caught on DS-R1-Qwen-7B demonstrates that runtime monitoring provides a practical defense layer beyond static prompt filtering.

The adaptive evaluation reveals critical limitations: topical attacks that maintain semantic relevance to the original query retain 11.9x amplification with 50% miss rates, suggesting attackers can evade detection by crafting queries superficially related to the user's request. The gap between raw attack amplification (22.8x) and defended amplification (2.2x) under full semantic evasion shows defenders still achieve meaningful protection, though not absolute.

The fallback QDM monitor for models without exposed reasoning traces extends applicability but with unknown detection rates. Future work should focus on understanding why topical evasion succeeds and whether reasoning-agnostic defenses can close this gap.

Key Takeaways

→RecurGuard achieves 99% detection of reasoning-token consumption attacks on tested models with near-zero false positives
→Topical prompt injection attacks can partially evade detection while retaining 11.9x resource amplification
→Reasoning trace visibility enables more sophisticated monitoring than input-only defenses but creates new attack surfaces
→Denial-of-wallet attacks represent a direct economic threat as reasoning tokens become billable infrastructure
→Runtime monitoring provides a practical middle layer between static prompt filtering and full semantic defenses