Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling
Researchers propose Continuous Softened Retracing reSampling (CSRS) to improve the self-evolution of Multimodal Large Language Models by addressing biases in feedback mechanisms. The method uses continuous reward signals instead of binary rewards and achieves state-of-the-art results on mathematical reasoning benchmarks like MathVision using Qwen2.5-VL-7B.