AIBearisharXiv – CS AI · 5h ago7/10
🧠
Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation
A comprehensive bibliometric audit reveals that academic papers evaluating large language models systematically lag behind frontier AI capabilities by a median of 10.85 points on the Epoch AI Capabilities Index, with this gap widening at 5.53 points annually. The study finds that most papers fail to disclose critical configuration details and make broad claims about "AI" capabilities rather than specific tested models, distorting how AI progress is understood in policy and media.
🧠 GPT-4🧠 GPT-5🧠 Claude