y0news
AnalyticsDigestsSourcesRSSAICrypto
#inference-time-alignment1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 17h ago5/10
๐Ÿง 

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

Researchers revisited Best-of-N (BoN) sampling for AI alignment and found it's actually optimal when evaluated using win-rate metrics rather than expected true reward. They propose a variant that eliminates reward-hacking vulnerabilities while maintaining optimal performance.