AIBearisharXiv – CS AI · 7h ago7/10
🧠
Beyond Memorization: Assessing Semantic Generalization in Large Language Models Using Phrasal Constructions
Researchers have developed a diagnostic evaluation framework using Construction Grammar to test whether large language models like GPT-o1 can truly understand language semantics beyond memorized patterns. The study reveals that state-of-the-art models fail to generalize across syntactically identical constructions with different meanings, dropping over 40% in performance on this task—a capability humans perform intuitively.