y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#code-llm-evaluation News & Analysis

1 article tagged with #code-llm-evaluation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv – CS AI · 18h ago7/10
🧠

Beyond Pass Rate: A Multilingual, Execution-Grounded Evaluation of Open Code LLMs

A comprehensive evaluation of 9 open-source coding LLMs across 2,707 LeetCode problems in 12 programming languages reveals significant performance gaps compared to human developers. The best model achieves only 23.64% correctness versus a 57.2% human baseline, with performance varying substantially across languages and problem types, indicating that aggregate benchmarks mask critical weaknesses in code generation systems.