AINeutralarXiv โ CS AI ยท Feb 276/104
๐ง
Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
Researchers propose using psychometric modeling to correct systematic biases in human evaluations of AI systems, demonstrating how Item Response Theory can separate true AI output quality from rater behavior inconsistencies. The approach was tested on OpenAI's summarization dataset and showed improved reliability in measuring AI model performance.