y0news
← Feed
←Back to feed
🧠 AI🟒 BullishImportance 7/10

Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling

arXiv – CS AI|Yunyao Yu, Zhengxian Wu, Zhuohong Chen, Hangrui Xu, Zirui Liao, Xiangwen Deng, Zhifang Liu, Senyuan Shi, Haoqian Wang|
πŸ€–AI Summary

Researchers propose Continuous Softened Retracing reSampling (CSRS) to improve the self-evolution of Multimodal Large Language Models by addressing biases in feedback mechanisms. The method uses continuous reward signals instead of binary rewards and achieves state-of-the-art results on mathematical reasoning benchmarks like MathVision using Qwen2.5-VL-7B.

Key Takeaways
  • β†’CSRS addresses the problem of biased feedback in MLLM self-evolution by replacing majority voting with more sophisticated reward mechanisms.
  • β†’The Retracing Re-inference Mechanism (RRM) expands exploration of long-tail reasoning paths from anchor points.
  • β†’Softened Frequency Reward (SFR) uses continuous signals instead of binary rewards to better calibrate model training.
  • β†’Visual Semantic Perturbation (VSP) ensures models prioritize mathematical logic over superficial visual patterns.
  • β†’The approach achieves state-of-the-art performance on geometric reasoning tasks and improves Qwen2.5-VL-7B's mathematical reasoning capabilities.
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles