AIBullisharXiv – CS AI · 5h ago7/10
🧠
AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use
AgentTrust is a runtime safety layer that intercepts AI agent tool calls before execution to prevent unsafe actions like accidental deletion, credential exposure, or data exfiltration. The system achieves 95-96.7% verdict accuracy across benchmarks using deobfuscation, risk chain detection, and LLM-based judgment, addressing a critical gap in AI agent safety infrastructure.