AINeutralarXiv – CS AI · 6h ago6/10
🧠
LibEvoBench: Probing Temporal Knowledge Stratification in Code Generation Models
Researchers introduce LibEvoBench, a benchmark testing how well AI code generation models handle multiple versions of Python libraries. The study reveals that state-of-the-art LLMs struggle with version-specific API knowledge, making anachronistic errors when libraries evolve, though documentation significantly improves performance.