AIBearisharXiv – CS AI · 6h ago7/10
🧠
Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?
Researchers found that large language models frequently arrive at correct code predictions through flawed reasoning, with performance dropping up to 70% when code undergoes semantics-preserving mutations. The study reveals substantial gaps between apparent accuracy and genuine semantic understanding, questioning the reliability of LLMs for critical programming tasks.