←Back to feed
🧠 AI🟢 BullishImportance 7/10
EnterpriseBench Corecraft: Training Generalizable Agents on High-Fidelity RL Environments
arXiv – CS AI|Sushant Mehta, Logan Ritchie, Suhaas Garre, Ian Niebres, Nick Heiner, Edwin Chen||4 views
🤖AI Summary
Surge AI introduces CoreCraft, the first environment in EnterpriseBench for training AI agents on realistic enterprise workflows. Training GLM 4.6 on this high-fidelity customer support simulation improved task performance from 25% to 37% and showed positive transfer to other benchmarks, demonstrating that quality training environments enable generalizable AI capabilities.
Key Takeaways
- →CoreCraft simulates a complete customer support organization with 2,500+ entities and 23 tools to test AI agent capabilities on real-world tasks.
- →Leading frontier models like GPT-5.2 and Claude Opus 4.6 solve fewer than 30% of expert-evaluated tasks.
- →Training GLM 4.6 with GRPO improved performance from 25.37% to 36.76% on held-out evaluation tasks.
- →The training gains transferred to other benchmarks with improvements of 4.5-7.4% across different domains.
- →High-fidelity, diverse training environments with expert evaluation are key to developing generalizable AI agents.
#ai-agents#reinforcement-learning#enterprise-ai#benchmarking#training-environments#generalization#customer-support#surge-ai
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Related Articles