AIBearisharXiv – CS AI · 7h ago7/10
🧠
Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code
Researchers have discovered that Grammar-Constrained Decoding (GCD), a technique used to improve code safety in Large Language Models, can actually be exploited as a jailbreak vector called CodeSpear. The study introduces CodeShield, a defensive alignment method that protects LLMs from generating malicious code even when attackers manipulate grammar constraints.