AINeutralarXiv โ CS AI ยท Mar 47/104
๐ง
Toward a Dynamic Stackelberg Game-Theoretic Framework for Agentic AI Defense Against LLM Jailbreaking
Researchers propose a game-theoretic framework using Stackelberg equilibrium and Rapidly exploring Random Trees to model interactions between attackers trying to jailbreak LLMs and defensive AI systems. The framework provides a mathematical foundation for understanding and improving AI safety guardrails against prompt-based attacks.