AINeutralarXiv – CS AI · 7h ago6/10
🧠
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
Researchers introduce PlanningBench, a framework for generating scalable and verifiable planning datasets to evaluate and train large language models on complex task coordination. The system uses a constraint-driven synthesis pipeline with adaptive difficulty control and finds that current frontier LLMs struggle with coupled constraints, though reinforcement learning on verified data improves performance across planning and instruction-following tasks.