AIBullisharXiv โ CS AI ยท 10h ago7/10
๐ง
Listener-Rewarded Thinking in VLMs for Image Preferences
Researchers introduce a listener-augmented reinforcement learning framework for training vision-language models to better align with human visual preferences. By using an independent frozen model to evaluate and validate reasoning chains, the approach achieves 67.4% accuracy on ImageReward benchmarks and demonstrates significant improvements in out-of-distribution generalization.
๐ข Hugging Face