AINeutralarXiv – CS AI · 14h ago6/10
🧠
Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation
Researchers evaluated 14 open-source safety guard models across 79,331 samples and found that smaller models like Qwen Guard (4B parameters) significantly outperform larger counterparts in detecting harmful content, achieving 83.97% recall compared to just 25% for some 20B parameter models. The study reveals that model size does not correlate with safety detection performance and that recall—minimizing missed harmful content—is the critical metric for production deployments.
🧠 Llama