AIBearisharXiv – CS AI · 10h ago7/10
🧠
Navigating the Sea of LLM Evaluation: Investigating Bias in Toxicity Benchmarks
Researchers have identified significant biases in large language model (LLM) toxicity benchmarks used to evaluate model safety, revealing that evaluation results vary inconsistently based on task type, data domain, and model choice. These findings expose critical gaps in current safety certification frameworks that organizations rely on to deploy AI systems responsibly.