AINeutralarXiv – CS AI · 10h ago7/10
🧠
Single-Configuration Attack Success Rate Is Not Enough: Jailbreak Evaluations Should Report Distributional Attack Success
A research paper argues that jailbreak attack evaluations should report distributional success rates across parameter configurations rather than single best-case scenarios. The authors propose two new metrics—Variant Sensitivity Measure (VSM) and Union Coverage (UC)—and demonstrate that attacks covering 81% in optimal configuration reach 100% coverage when all variants are tested, fundamentally changing threat assessments.