BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows
Researchers introduced BankerToolBench (BTB), an open-source benchmark to evaluate AI agents on investment banking workflows developed with 502 professional bankers. Testing nine frontier models revealed that even the best performer (GPT-5.4) fails nearly half of evaluation criteria, with zero outputs rated client-ready, highlighting significant gaps in AI readiness for high-stakes professional work.

