y0news
#human-preference1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago4
๐Ÿง 

LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering

Researchers released LFQA-HP-1M, a dataset with 1.3 million human preference annotations for evaluating long-form question answering systems. The study introduces nine quality rubrics and shows that simple linear models can match advanced LLM evaluators while exposing vulnerabilities in current evaluation methods.