y0news
← Feed
Back to feed
🧠 AI Neutral

Automated Concept Discovery for LLM-as-a-Judge Preference Analysis

arXiv – CS AI|James Wedgwood, Chhavi Yadav, Virginia Smith|
🤖AI Summary

Researchers developed automated methods to discover biases in Large Language Models when used as judges, analyzing over 27,000 paired responses. The study found LLMs exhibit systematic biases including preference for refusing sensitive requests more than humans, favoring concrete and empathetic responses, and showing bias against certain legal guidance.

Key Takeaways
  • Sparse autoencoder-based approaches recover more interpretable preference features than alternative methods while remaining competitive in predicting LLM decisions.
  • LLMs tend to refuse sensitive requests at higher rates than human evaluators would.
  • AI judges show bias toward responses emphasizing concreteness and empathy in new situations.
  • LLM evaluators prefer detail and formality in academic advice contexts.
  • The research enables systematic analysis of LLM judge preferences without requiring predefined bias categories.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles