AIBearisharXiv – CS AI · 6h ago7/10
🧠
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions
A comprehensive measurement study reveals that large language models frequently specify vulnerable and incompatible library versions in generated Python code, with 36.70%-55.70% of tasks containing known CVEs and 62.75%-74.51% rated as Critical or High severity. The research demonstrates this represents a systemic bias across all evaluated models rather than isolated errors, with most CVEs publicly disclosed before the models' knowledge cutoffs.