y0news
AnalyticsDigestsSourcesRSSAICrypto
#quantile-estimation1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 5d ago6/103
๐Ÿง 

Quantile Advantage Estimation: Stabilizing RLVR for LLM Reasoning

Researchers propose Quantile Advantage Estimation (QAE) to stabilize Reinforcement Learning with Verifiable Rewards (RLVR) for large language model reasoning. The method replaces mean baselines with group-wise K-quantile baselines to prevent entropy collapse and explosion, showing sustained improvements on mathematical reasoning tasks.