#threat-modeling News & Analysis

12 articles tagged with #threat-modeling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AIBearisharXiv – CS AI · Jun 257/10

🧠

What Does It Mean to Break a Distillation Defense?

Researchers propose a formal threat model framework for evaluating distillation defenses against black-box LLM attacks, arguing that existing output perturbation defenses lack clear specifications about attacker capabilities. The work demonstrates that defense effectiveness depends heavily on assumed threat parameters, raising concerns about false security claims in deployed systems.

AIBearisharXiv – CS AI · Jun 87/10

🧠

From Privacy to Workflow Integrity: Communication-Graph Metadata in Autonomous Agent Interoperability

Researchers identify a critical vulnerability in agent-interoperability protocols like A2A and MCP: while message content is encrypted, the communication metadata revealing which agents contact each other, when, and how often exposes pending workflows and enables adversaries to predict and preempt autonomous actions. The study demonstrates that observers can infer task classes from metadata patterns alone and that metadata-protecting transports significantly reduce this predictive leverage.

AIBearishMIT Technology Review · Jun 57/10

🧠

The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains

Attackers exploited Meta's AI customer support agent to compromise Instagram accounts, revealing critical security vulnerabilities in AI systems beyond existing frameworks like Mythos. The incident demonstrates that AI security requires comprehensive threat modeling across all deployment vectors, not just isolated technical safeguards.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Hidden Thoughts Are Not Secret: Reasoning Trace Exposure in LLMs

Researchers demonstrate that reasoning traces hidden by large language models can be exposed through Reasoning Exposure Prompting (REP), a technique using shadow-model demonstrations to elicit internal reasoning through prompts. This finding challenges the security assumptions of deployed reasoning systems that intentionally conceal their internal processes from users.

AIBearisharXiv – CS AI · May 287/10

🧠

Position: Retire the "Positive Backdoor" Label -- Secret Alignment Requires Strict and Systematic Evaluation

A research position paper argues the AI/ML community should abandon the "positive backdoor" terminology and instead rigorously evaluate trigger-activated hidden behaviors as "Secret Alignment." Researchers found that existing implementations show significant brittleness in security properties, particularly in confidentiality, integrity, and availability—revealing that protective claims lack standardized evaluation frameworks.

AINeutralarXiv – CS AI · May 127/10

🧠

MATRA: Modeling the Attack Surface of Agentic AI Systems -- OpenClaw Case Study

Researchers introduce MATRA, a threat modeling framework designed to systematically assess security risks in autonomous AI agent systems. The framework combines asset-based impact analysis with attack trees to quantify how LLM vulnerabilities translate into real-world deployment risks, demonstrating its effectiveness on an OpenClaw personal agent case study.

AIBearisharXiv – CS AI · May 17/10

🧠

From Prompt to Physical Actuation: Holistic Threat Modeling of LLM-Enabled Robotic Systems

Researchers present the first comprehensive threat modeling of LLM-enabled robotic systems, mapping three categories of attacks (cyber, adversarial, and conversational) across the perception-planning-actuation pipeline. The analysis reveals critical architectural vulnerabilities where compromised inputs or unsafe model outputs can propagate to unsafe physical actions without proper validation boundaries.

AIBearisharXiv – CS AI · Apr 207/10

🧠

Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP

Researchers present a systematic security analysis of four emerging AI agent communication protocols (MCP, A2A, Agora, ANP), identifying twelve protocol-level risks and demonstrating critical vulnerabilities in validation mechanisms. The study provides the first standardized threat modeling framework for AI agent ecosystems, revealing that current protocols lack adequate security guardrails for cross-organizational interoperability.

AIBearisharXiv – CS AI · Apr 107/10

🧠

Physical Adversarial Attacks on AI Surveillance Systems:Detection, Tracking, and Visible--Infrared Evasion

This research paper examines physical adversarial attacks on AI surveillance systems through a surveillance-oriented lens, emphasizing that robustness cannot be assessed from isolated image benchmarks alone. The study highlights critical gaps in current evaluation practices, including temporal persistence across frames, multi-modal sensing (visible and infrared), realistic attack carriers, and system-level objectives that must be tested under actual deployment constraints.

AIBearisharXiv – CS AI · Mar 117/10

🧠

Security Considerations for Multi-agent Systems

A comprehensive study reveals that multi-agent AI systems (MAS) face distinct security vulnerabilities that existing frameworks inadequately address. The research evaluated 16 AI security frameworks against 193 identified threats across 9 categories, finding that no framework achieves majority coverage in any single category, with non-determinism and data leakage being the most under-addressed areas.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Goal-Driven Risk Assessment for LLM-Powered Systems: A Healthcare Case Study

Researchers propose a new goal-driven risk assessment framework for LLM-powered systems, specifically targeting healthcare applications. The approach uses attack trees to identify detailed threat vectors combining adversarial AI attacks with conventional cyber threats, addressing security gaps in LLM system design.

AINeutralarXiv – CS AI · Jun 196/10

🧠

One Probe Won't Catch Them All: Towards Targeted Deception Detection

Researchers demonstrate that universal linear probes for detecting AI deception are fundamentally limited, achieving only modest performance improvements. The study reveals deception detection requires type-specific probes tailored to particular threat models rather than single universal detectors, with performance varying significantly based on instruction pair design.