y0news
AnalyticsDigestsSourcesRSSAICrypto
#coding-evaluation1 article
1 articles
AINeutralOpenAI News ยท Feb 236/105
๐Ÿง 

Why we no longer evaluate SWE-bench Verified

SWE-bench Verified, a popular coding evaluation benchmark, is being discontinued due to increasing contamination and flawed testing methodology. The analysis reveals training data leakage and unreliable test cases that fail to accurately measure AI coding capabilities, with SWE-bench Pro recommended as the replacement.