AIBearisharXiv – CS AI · 8h ago7/10
🧠
CFAgentBench: A Reproducible Environment and Benchmark for Autonomous Construction-Finance Agents
Researchers introduce CFAgentBench, a comprehensive benchmark for testing autonomous AI agents in construction finance workflows. The benchmark includes 1,014 task specifications across real software tools (ERP, payroll, banking portals) with strict functional grading, revealing that top models achieve only 67% accuracy on single attempts but collapse to 38% when consistency is required.