AIBullisharXiv – CS AI · 6h ago7/10
🧠
Milestone-Guided Policy Learning for Long-Horizon Language Agents
Researchers introduce BEACON, a milestone-guided policy learning framework that significantly improves training efficiency for long-horizon language agents by solving credit misattribution and sample inefficiency problems. The approach achieves 92.9% success rates on complex tasks—nearly double previous benchmarks—while improving sample utilization from 23.7% to 82.0%.