Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults
Researchers demonstrate that LLM agents' decisions can be systematically manipulated through adversarial feed curation—the ordering and composition of information sources agents consume before acting. Testing on 2,785 decision rollouts across four open-source LLMs, they found feeds can shift genuinely uncertain decisions from 5% to 100% in one direction, though they cannot override firmly held model defaults, revealing a critical safety vulnerability in the upstream ranker layer rather than the model itself.
This research exposes a structural vulnerability in how LLM agents process external information before making decisions. Rather than attacking the model's core reasoning or final prompts, adversaries can manipulate the ranked feeds—social media posts, search results, retrieval contexts—that precede agent actions. The study's controlled protocol isolates feed effects from model capabilities, demonstrating that information curation acts as a "default-bounded control surface" that shifts marginal decisions dramatically while leaving strong priors intact.
The findings emerge from growing deployment of autonomous agents that interact with dynamic, ranked information streams. While most safety research focuses on model robustness or prompt engineering, this work identifies the often-overlooked ranker as a critical attack vector. Across multiple domains including security-relevant decisions like removing deployment gates or relaxing access controls, one-sided feeds consistently influence uncertain agents. The dose-response relationship and statistical significance (Fisher p as low as 3×10^-10) indicate this isn't random noise but systematic manipulation.
For developers and organizations deploying LLM agents, this reveals that safety audits must extend beyond model testing to feed-layer defenses. The fact that some simple mitigations exist but frontier models retain stronger defaults suggests architectural choices matter. This particularly affects industries relying on autonomous decision-making in security, financial, or policy contexts where feed manipulation could have material consequences. The research doesn't establish real-world adversarial feed construction is trivial, but it proves the attack surface exists and exploitability varies by model robustness.
- →LLM agent decisions are vulnerable to manipulation through adversarial feed curation, shifting uncertain decisions from 5% to 100% confidence
- →Information rankers upstream of agent prompts represent an overlooked safety surface that current evaluations fail to audit
- →The effect is dose-dependent, generalizes across decision domains, and persists even when controlling for writing-style artifacts
- →Simple feed-level defenses exist and frontier models show greater resilience, suggesting architectural robustness is achievable
- →Security-relevant agent decisions like access control modifications are within the scope of feed-based manipulation