AIBearisharXiv – CS AI · 7h ago7/10
🧠
AMEL: Accumulated Message Effects on LLM Judgments
Researchers discovered that large language models exhibit systematic bias in evaluations based on prior conversation history, with models shifting judgments toward the polarity of preceding items. The effect persists across 12 models from major providers and is stronger for uncertain cases and negative histories, raising concerns for applications relying on LLM-based automated evaluation.
🏢 OpenAI🏢 Anthropic🧠 GPT-5