AIBearisharXiv – CS AI · 9h ago7/10
🧠
OrchJail: Jailbreaking Tool-Calling Text-to-Image Agents by Orchestration-Guided Fuzzing
Researchers have developed OrchJail, a fuzzing framework that discovers vulnerabilities in tool-calling text-to-image AI agents by exploiting how multiple benign steps combine into unsafe outputs. Unlike traditional prompt-injection attacks, OrchJail targets the orchestration layer where agents chain tools together, achieving higher attack success rates while evading existing defenses.