AIBullisharXiv โ CS AI ยท 5h ago7/10
๐ง
Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies
Researchers introduce Preference Goal Tuning (PGT), a novel post-training framework that optimizes goal embeddings as continuous control variables rather than updating frozen policy parameters. Testing on Minecraft SkillForge demonstrates PGT achieves 72-81% relative improvements over expert-crafted prompts while showing superior generalization in out-of-distribution settings compared to traditional fine-tuning.