Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents
Researchers have discovered a new attack vulnerability in mobile vision-language agents where malicious prompts remain invisible to human users but are triggered during autonomous agent interactions. Using an optimization method called HG-IDA*, attackers can achieve 82.5% planning and 75.0% execution hijack rates on GPT-4o by exploiting the lack of touch signals during agent operations, exposing a critical security gap in deployed mobile AI systems.
This research exposes a fundamental asymmetry in how vision-language models interact with humans versus autonomous agents on mobile devices. The attack exploits the fact that automated agents generate near-zero contact touch signals, creating an invisible window where malicious visual prompts can execute without detection. Traditional jailbreak attempts require persistent visual manipulations that users notice, but this new paradigm separates agent perception from human perception entirely, making detection significantly harder.
The vulnerability emerges from the rapid deployment of LVLMs as mobile agents without adequate consideration of realistic threat models. As AI systems move beyond controlled lab environments into personal devices handling sensitive user data and cross-app actions, the interaction surface expands dramatically. The introduction of HG-IDA* as a one-shot optimization method demonstrates how attackers can systematically bypass safety filters through efficient prompt engineering tailored specifically for agent exploitation.
The 82.5% planning hijack rate represents a severe capability gap—agents can be reliably tricked into planning unauthorized actions before execution even occurs. For users and developers, this suggests current safety mechanisms in vision-language models are fundamentally inadequate for autonomous mobile scenarios. The attack succeeds because existing defenses focus on visible content manipulation rather than the interaction patterns themselves.
Moving forward, the security community must prioritize interaction-level signals as defense mechanisms. Systems need to detect anomalies in how agents perceive and respond to visual input differently from humans. This finding likely accelerates industry discussions around agent sandboxing, permission models, and the need for behavioral verification systems that validate whether agent actions align with authentic user intent.
- →Mobile vision-language agents can be hijacked through invisible jailbreak prompts that don't appear to human users, achieving up to 82.5% attack success rates
- →The vulnerability exploits the lack of touch contact signals during autonomous agent interactions, creating a detection-free attack window
- →HG-IDA* optimization enables efficient one-shot jailbreak prompt construction that evades current LVLM safety filters
- →Current AI safety mechanisms prioritize visible content manipulation over interaction-level anomalies, leaving agents exposed
- →Cross-app action hijacking demonstrates that the threat extends beyond single-application compromise to system-wide unauthorized access