y0news
← Feed
Back to feed
🧠 AI🟢 BullishImportance 7/10

EnterpriseBench Corecraft: Training Generalizable Agents on High-Fidelity RL Environments

arXiv – CS AI|Sushant Mehta, Logan Ritchie, Suhaas Garre, Ian Niebres, Nick Heiner, Edwin Chen||4 views
🤖AI Summary

Surge AI introduces CoreCraft, the first environment in EnterpriseBench for training AI agents on realistic enterprise workflows. Training GLM 4.6 on this high-fidelity customer support simulation improved task performance from 25% to 37% and showed positive transfer to other benchmarks, demonstrating that quality training environments enable generalizable AI capabilities.

Key Takeaways
  • CoreCraft simulates a complete customer support organization with 2,500+ entities and 23 tools to test AI agent capabilities on real-world tasks.
  • Leading frontier models like GPT-5.2 and Claude Opus 4.6 solve fewer than 30% of expert-evaluated tasks.
  • Training GLM 4.6 with GRPO improved performance from 25.37% to 36.76% on held-out evaluation tasks.
  • The training gains transferred to other benchmarks with improvements of 4.5-7.4% across different domains.
  • High-fidelity, diverse training environments with expert evaluation are key to developing generalizable AI agents.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles