AINeutralarXiv – CS AI · 18h ago6/10
🧠
Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning
Researchers propose PVPO, a sample-efficient reinforcement learning method that improves LLM-based LEGO assembly generation by addressing PhysHack, a failure mode where structures satisfy physical constraints but lack semantic or geometric coherence. The approach uses selective data training and couples physical feasibility with geometric rewards, achieving better structural alignment while reducing reliance on rejection sampling.