🧠 AI⚪ NeutralImportance 7/10

TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction

arXiv – CS AI|Moon Ye-Bin, Nam Hyeon-Woo, Baek Seong-Eun, Yejin Yeo, Tae-Hyun Oh|June 19, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce TRAP, a benchmark evaluating AI agents' ability to complete document-intensive tasks using private information while resisting extraction attempts. Testing 22 models reveals all exhibit privacy leakage, with instruction-following ability correlating to higher exposure risk, though a proposed structural isolation method using hash keys shows promise in mitigating the fundamental trade-off between task accuracy and privacy protection.

Analysis

The deployment of AI agents in enterprise workflows creates an unprecedented security challenge: models must reliably access sensitive data like passport numbers to function effectively, yet must never leak this information regardless of how users attempt to extract it. This tension between capability and safety represents a critical gap in current AI system design, as the very features making models useful for complex tasks amplify their vulnerability to privacy attacks.

The TRAP benchmark addresses a real operational problem affecting financial services, healthcare, travel, and enterprise document processing. Current defenses like prompt engineering offer only partial protection while degrading task performance, creating impossible trade-offs for deployment. The research's theoretical finding—that softmax-based models cannot simultaneously achieve high task success and zero leakage probability through soft constraints alone—establishes fundamental limits to prompt-based defenses.

The proposed structural private field isolation method, which replaces sensitive data with hash keys before model input, offers a practical alternative to the accuracy-privacy trade-off. This approach suggests the solution lies in system architecture rather than model behavior modification. For organizations deploying agents in sensitive domains, this research validates concerns about current deployment practices while providing a technical roadmap forward.

The correlation between instruction-following ability and leakage rates has important implications for model selection. Frontier proprietary models, typically optimized for instruction adherence, may present greater privacy risks in document-intensive workflows despite superior general performance. This finding should influence procurement decisions and architectural choices for systems handling sensitive data.

Key Takeaways

→All 22 tested AI models exhibit non-trivial privacy leakage when handling sensitive information, with stronger instruction-following correlating to higher extraction vulnerability.
→Prompt-based defenses reduce leakage but significantly degrade task accuracy, creating an unsolvable trade-off under current softmax-based model architectures.
→Structural private field isolation using hash key replacement prevents most leakage while maintaining task accuracy, suggesting architectural solutions outperform behavioral ones.
→The fundamental tension between capability and privacy in document-intensive agent workflows cannot be resolved through soft constraints alone.
→Model selection for sensitive domains must now account for privacy leakage risk, potentially favoring less instruction-optimized models over frontier proprietary options.