AIBearisharXiv โ CS AI ยท 4h ago7/10
๐ง
CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents
Researchers introduce CostBench, a new benchmark for evaluating AI agents' ability to make cost-optimal decisions and adapt to changing conditions. Testing reveals significant weaknesses in current LLMs, with even GPT-5 achieving less than 75% accuracy on complex cost-optimization tasks, dropping further under dynamic conditions.
๐ง GPT-5