y0news
AnalyticsDigestsSourcesRSSAICrypto
#economic-reasoning1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 4h ago7/10
๐Ÿง 

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Researchers introduce CostBench, a new benchmark for evaluating AI agents' ability to make cost-optimal decisions and adapt to changing conditions. Testing reveals significant weaknesses in current LLMs, with even GPT-5 achieving less than 75% accuracy on complex cost-optimization tasks, dropping further under dynamic conditions.

๐Ÿง  GPT-5