AIBullisharXiv โ CS AI ยท 9h ago6/10
๐ง
VLA-Thinker: Boosting Vision-Language-Action Models through Thinking-with-Image Reasoning
Researchers introduce VLA-Thinker, a new AI framework that enhances Vision-Language-Action models by enabling dynamic visual reasoning during robotic tasks. The system achieved a 97.5% success rate on LIBERO benchmarks through a two-stage training pipeline combining supervised fine-tuning and reinforcement learning.