🧠 AI🔴 BearishImportance 7/10

HARP: Measuring Harm Amplification in Multi-Agent LLM Systems

arXiv – CS AI|Md Hafizur Rahman, Zafaryab Haider, Tanzim Mahfuz, Prabuddha Chakraborty|May 28, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce HARP, a methodology for measuring how harm propagates across multi-agent LLM systems when one component is compromised. Testing on a finance-oriented seven-agent system reveals that single-agent compromise creates the strongest amplification effects, while existing defenses struggle to balance security with utility costs.

Analysis

Multi-agent LLM systems represent a significant architectural shift in AI deployment, distributing complex workflows across specialized components with shared memory and decision gates. HARP addresses a critical blind spot in current security evaluation: measuring not just whether attacks succeed, but how localized compromises cascade into system-wide failures. This distinction matters because a bounded perturbation to a single agent can become amplified as other agents process and propagate corrupted outputs, creating disproportionate harm relative to the initial attack surface.

The research demonstrates that financial AI systems face particular vulnerability through specialist compromise and shared-context corruption vectors. While prompt-only defenses preserve system utility, they fail to meaningfully reduce attack success rates. More aggressive interventions like IntegrityGuard reduce harm but introduce unacceptable latency or token cost penalties—a fundamental trade-off that constrains practical deployment.

For AI developers and financial institutions, HARP establishes that traditional single-agent robustness testing is insufficient for multi-agent architectures. Attacks don't require compromising all agents; compromising one strategic component can trigger system-wide degradation through normal orchestration flows. This insight reshapes security requirements: orchestration patterns themselves become attack surfaces requiring specialized defense mechanisms.

Looking forward, the field must move beyond binary attack-success metrics toward propagation-aware evaluation frameworks. Organizations deploying multi-agent financial AI systems should expect that current defenses require significant utility trade-offs, and may need custom orchestration architectures or trust boundaries rather than relying solely on component-level hardening.

Key Takeaways

→HARP introduces harm amplification metrics showing how single-agent compromise propagates across multi-agent LLM systems in non-linear ways.
→Shared-context corruption and temporal persistence attacks achieve highest success rates despite current defense mechanisms.
→Prompt-only defenses fail to reduce attack success while preserving benign utility, creating a false security illusion.
→IntegrityGuard trace-consistency defense reduces harm but imposes significant latency and token cost penalties on normal operations.
→Multi-agent financial AI systems require orchestration-aware security models beyond traditional single-agent robustness evaluation.