🧠 AI⚪ NeutralImportance 6/10

Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

arXiv – CS AI|James Wedgwood, Chhavi Yadav, Virginia Smith|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers developed automated methods to discover biases in Large Language Models when used as judges, analyzing over 27,000 paired responses. The study found LLMs exhibit systematic biases including preference for refusing sensitive requests more than humans, favoring concrete and empathetic responses, and showing bias against certain legal guidance.

Key Takeaways

→Sparse autoencoder-based approaches recover more interpretable preference features than alternative methods while remaining competitive in predicting LLM decisions.
→LLMs tend to refuse sensitive requests at higher rates than human evaluators would.
→AI judges show bias toward responses emphasizing concreteness and empathy in new situations.
→LLM evaluators prefer detail and formality in academic advice contexts.
→The research enables systematic analysis of LLM judge preferences without requiring predefined bias categories.