AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks
Researchers introduced HealthAdminBench, a new evaluation framework with 135 tasks across realistic healthcare administration workflows, revealing that current AI agents achieve only 36.3% end-to-end success despite strong individual subtask performance. The benchmark demonstrates a critical gap between AI capabilities and the reliability requirements for automating healthcare administrative processes worth over $1 trillion annually.
๐ง GPT-5๐ง Claude๐ง Opus