AIBullisharXiv – CS AI · 15h ago7/10
🧠
Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models
Researchers have developed a framework using behavioral geometry to predict which AI models are vulnerable to jailbreak attacks and efficiently transfer defensive measures across model populations. The approach achieves 94% detection accuracy while reducing evaluation probes by 98%, enabling practical security assessment across thousands of model configurations.