AINeutralOpenAI News ยท Feb 236/105
๐ง
Why we no longer evaluate SWE-bench Verified
SWE-bench Verified, a popular coding evaluation benchmark, is being discontinued due to increasing contamination and flawed testing methodology. The analysis reveals training data leakage and unreliable test cases that fail to accurately measure AI coding capabilities, with SWE-bench Pro recommended as the replacement.