AIBearisharXiv – CS AI · 7h ago6/10
🧠
Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures
Researchers introduced DSR-Bench, a comprehensive benchmark testing whether large language models can reason about data structures and algorithms. Testing 13 state-of-the-art LLMs revealed significant limitations, with the best model achieving only 46% accuracy on challenging tasks, while models struggled particularly with spatial reasoning and code generation.