AIBearisharXiv – CS AI · 18h ago7/10
🧠
Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators
A new research paper reveals that LLM-based safety judges—widely used to evaluate AI safety at scale—have significant blind spots: they struggle to adapt their evaluations when presented with new contextual information or alternative safety definitions that conflict with their internal priors. This limitation undermines confidence in current safety evaluation methodologies across the AI industry.