←Back to feed
🧠 AI🟢 BullishImportance 7/10
Can LLMs Learn to Reason Robustly under Noisy Supervision?
arXiv – CS AI|Shenzhi Yang, Guangcheng Zhu, Bowen Song, Sharon Li, Haobo Wang, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen|
🤖AI Summary
Researchers propose Online Label Refinement (OLR) to improve AI reasoning models' robustness under noisy supervision in Reinforcement Learning with Verifiable Rewards. The method addresses the critical problem of training language models when expert-labeled data contains errors, achieving 3-4% performance gains across mathematical reasoning benchmarks.
Key Takeaways
- →RLVR training methods are vulnerable to noisy labels due to expert scarcity, with two distinct types of noise affecting model performance differently.
- →Early Correctness Coherence phenomenon shows both clean and noisy samples improve similarly in early training stages before diverging.
- →Online Label Refinement progressively corrects noisy labels using majority-voted answers when specific consistency conditions are met.
- →OLR demonstrates consistent improvements across noise ratios from 10% to 90% on both in-distribution and out-of-distribution tasks.
- →The approach shows promise for making AI reasoning systems more robust to imperfect training data in real-world applications.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles