AINeutralarXiv – CS AI · 7h ago6/10
🧠
ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward
ProcessThinker introduces a novel post-training method for multimodal large language models that provides step-level process rewards without requiring explicit reward model training. By using rollout-based sampling to verify intermediate reasoning steps, the approach improves visual question answering across multiple benchmarks while reducing computational overhead compared to traditional process reward models.