AIBearisharXiv โ CS AI ยท 1d ago6/10
๐ง
The Alignment Tax: Response Homogenization in Aligned LLMs and Its Implications for Uncertainty Estimation
Research reveals that RLHF-aligned language models suffer from 'alignment tax' - producing homogenized responses that severely impair uncertainty estimation methods. The study found 40-79% of questions on TruthfulQA generate nearly identical responses, with alignment processes like DPO being the primary cause of this response homogenization.