AIBearisharXiv – CS AI · 9h ago7/10
🧠
Position: State-of-the-Art Claims Require State-of-the-Art Evidence
Researchers identify a widespread gap between State-of-the-Art claims in AI/ML research and the evidence supporting them. Analysis of ten major benchmarks reveals that marginal improvements in aggregate scores often mask fragility, with gains driven by outlier datasets rather than meaningful superiority across tasks.