AIBearisharXiv – CS AI · 6h ago6/10
🧠
Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?
Researchers benchmarked 7 frontier LLMs against China's National Computer Rank Examination, a standardized office proficiency test with 200 practical tasks across Word, Excel, and PowerPoint. Single-turn models achieved only 36.6% accuracy, while advanced agentic systems with iterative feedback reached 68.8%, revealing significant gaps in LLM-based office automation despite recent code-generation improvements.