Evaluating and Mitigating LLM-as-a-judge Bias in Communication Systems
Researchers analyzed bias in 6 large language models used as autonomous judges in communication systems, finding that while current LLM judges show robustness to biased inputs, fine-tuning on biased data significantly degrades performance. The study identified 11 types of judgment biases and proposed four mitigation strategies for fairer AI evaluation systems.