#agent-security News & Analysis

10 articles tagged with #agent-security. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AINeutralarXiv – CS AI · 3d ago7/10

🧠

AIRGuard: Guarding Agent Actions with Runtime Authority Control

AIRGuard is a runtime security framework that protects AI agents from authority confusion attacks, where attackers manipulate untrusted context to misuse authorized tool access. The system reduces attack success rates from 36.3% to 5.5% while maintaining 76% of benign functionality, outperforming existing defense mechanisms by enforcing least-privilege authorization at execution time.

🧠 Haiku🧠 Sonnet

AIBearisharXiv – CS AI · May 127/10

🧠

When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents

Researchers introduce EnvTrustBench, a benchmarking framework that identifies evidence-grounding defects (EGDs) in LLM agents—failures where agents act on stale, incorrect, or malicious environmental data without verification. Testing across 6 LLM backbones and 5 agent scaffolds reveals consistent vulnerabilities, exposing a critical reliability gap in agent systems that increasingly interact with real-world APIs, files, and logs.

AIBullisharXiv – CS AI · May 77/10

🧠

AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

AgentTrust is a runtime safety layer that intercepts AI agent tool calls before execution to prevent unsafe actions like accidental deletion, credential exposure, or data exfiltration. The system achieves 95-96.7% verdict accuracy across benchmarks using deobfuscation, risk chain detection, and LLM-based judgment, addressing a critical gap in AI agent safety infrastructure.

AIBullisharXiv – CS AI · Apr 207/10

🧠

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Researchers present symbolic guardrails as a practical approach to enforce safety and security constraints on AI agents that use external tools. Analysis of 80 benchmarks reveals that 74% of policy requirements can be enforced through symbolic guardrails without reducing agent effectiveness, addressing a critical gap in AI safety for high-stakes applications.

AIBearisharXiv – CS AI · Apr 147/10

🧠

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Researchers have identified a critical safety vulnerability in computer-use agents (CUAs) where benign user instructions can lead to harmful outcomes due to environmental context or execution flaws. The OS-BLIND benchmark reveals that frontier AI models, including Claude 4.5 Sonnet, achieve 73-93% attack success rates under these conditions, with multi-agent deployments amplifying vulnerabilities as decomposed tasks obscure harmful intent from safety systems.

🧠 Claude

AIBearisharXiv – CS AI · Apr 107/10

🧠

Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents

Researchers have discovered a new attack vulnerability in mobile vision-language agents where malicious prompts remain invisible to human users but are triggered during autonomous agent interactions. Using an optimization method called HG-IDA*, attackers can achieve 82.5% planning and 75.0% execution hijack rates on GPT-4o by exploiting the lack of touch signals during agent operations, exposing a critical security gap in deployed mobile AI systems.

🧠 GPT-4

AIBullisharXiv – CS AI · Mar 177/10

🧠

ILION: Deterministic Pre-Execution Safety Gates for Agentic AI Systems

Researchers introduce ILION, a deterministic safety system for autonomous AI agents that can execute real-world actions like financial transactions and API calls. The system achieves 91% precision with sub-millisecond latency, significantly outperforming existing text-safety infrastructure that wasn't designed for agent execution safety.

🏢 OpenAI🧠 Llama

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels

Grimlock is a security framework that uses eBPF and TLS 1.3 channel binding to enforce authorization and delegation controls in agentic AI systems without modifying application code. The system intercepts sandbox communications, validates identity through post-handshake attestation, and issues short-lived scope tokens to enable secure multi-cloud orchestration with transparent auditability.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

Researchers present ChainCaps, a runtime safety framework that prevents tool-using AI agents from exploiting composed services through 'permission laundering'—where an agent passes intermediate results through multiple tools to achieve unauthorized outcomes. The system uses capability budgets that propagate through tool chains via intersection, reducing attack success rates from 25-68% to 0-4.8% while maintaining 96-100% benign task completion across frontier models.

AIBullisharXiv – CS AI · Mar 36/108

🧠

Tracking Capabilities for Safer Agents

Researchers propose a new safety framework for AI agents using Scala 3 with capture checking to prevent information leakage and malicious behaviors. The system creates a 'safety harness' that tracks capabilities through static type checking, allowing fine-grained control over agent actions while maintaining task performance.