AINeutralarXiv – CS AI · 6h ago6/10
🧠
When No Benchmark Exists: Validating Comparative LLM Safety Scoring Without Ground-Truth Labels
Researchers propose a framework for comparing language models on safety without labeled benchmark data, introducing SimpleAudit as a validation tool that uses controlled contrasts and variance analysis to establish model safety rankings. The study demonstrates that comparative safety scores are inherently context-dependent, requiring detailed reporting of methods rather than single rankings.