y0news
AnalyticsDigestsSourcesRSSAICrypto
#fieldworkarena1 article
1 articles
AIBearishIEEE Spectrum โ€“ AI ยท Jan 297/106
๐Ÿง 

When Will AI Agents Be Ready for Autonomous Business Operations?

Researchers at Carnegie Mellon University and Fujitsu developed three benchmarks to assess when AI agents are safe enough for autonomous business operations. The first benchmark, FieldWorkArena, showed current AI models like GPT-4o, Claude, and Gemini perform poorly on real-world enterprise tasks, struggling with accuracy in safety compliance and logistics applications.