AINeutralarXiv โ CS AI ยท 4d ago6/103
๐ง
Benchmarking Overton Pluralism in LLMs
Researchers introduced OVERTONBENCH, a framework for measuring viewpoint diversity in large language models through the OVERTONSCORE metric. In a study of 8 LLMs with 1,208 participants, models scored 0.35-0.41 out of 1.0, with DeepSeek V3 performing best, showing significant room for improvement in pluralistic representation.