y0news
#human-feedback2 articles
2 articles
AINeutralarXiv โ€“ CS AI ยท 4h ago7
๐Ÿง 

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning

Researchers propose a unified theory explaining why AI models trained on human feedback exhibit persistent error floors that cannot be eliminated through scaling alone. The study demonstrates that human supervision acts as an information bottleneck due to annotation noise, subjective preferences, and language limitations, requiring auxiliary non-human signals to overcome these structural limitations.

AINeutralarXiv โ€“ CS AI ยท 4h ago2
๐Ÿง 

RewardUQ: A Unified Framework for Uncertainty-Aware Reward Models

Researchers introduce RewardUQ, a unified framework for evaluating uncertainty quantification in reward models used to align large language models with human preferences. The study finds that model size and initialization have the most significant impact on performance, while providing an open-source Python package to advance the field.