The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models
A comprehensive systematic review of 337 studies examines how Transformer-based language models encode syntactic knowledge, finding strong performance on formal syntax but variable results at the syntax-semantics interface. The research reveals that while these models demonstrate non-trivial syntactic abilities through behavioral and mechanistic evidence, understanding the detailed computational mechanisms remains limited due to methodological heterogeneity and heavy concentration on English and BERT-like architectures.
This systematic review represents a significant effort to consolidate fragmented interpretability research on how Transformer language models process grammatical structure. By aggregating over 3,000 datapoints across diverse methodologies and languages, the researchers establish that modern LLMs do encode meaningful syntactic knowledge—a finding that challenges earlier skepticism about whether neural networks truly capture linguistic rules or merely exploit statistical patterns. The distinction between strong performance on formal syntax and weaker results at syntax-semantics interfaces suggests these models learn structural rules differently than semantic understanding, with important implications for how we conceptualize language model capabilities.
This work emerges from growing recognition that interpretability research remains fractured across competing methodologies. Probing studies, mechanistic investigations, and behavioral evaluations each offer different windows into model cognition, but lack unified frameworks for comparison. The concentration on English and BERT-derived models highlights a critical blind spot: claims about universal syntactic properties may reflect dataset bias rather than fundamental principles of how Transformers process language. The consistent performance degradation for languages with limited digital representation directly mirrors broader AI equity concerns affecting commercial deployments.
For AI developers and researchers, these findings suggest that syntactic knowledge in language models is more robust than previously understood, but also more opaque. The call for methodological standardization addresses a real research gap that hinders progress on model alignment and reliability. The emphasis on underrepresented languages points toward future work that could improve multilingual model performance and reduce Western-language bias in AI systems.
- →Transformer models encode non-trivial syntactic knowledge, validated through behavioral, probing, and mechanistic studies across 337 research articles.
- →Performance gaps exist between formal syntax tasks and syntax-semantics interface phenomena, suggesting differential learning mechanisms for structure versus meaning.
- →Research concentration on English and BERT-like models limits generalizability; findings may not extend to other architectures or low-resource languages.
- →Methodological heterogeneity across studies impedes deep understanding of computational mechanisms underlying syntactic processing in language models.
- →Languages with less digital support consistently show lower model performance, reflecting broader representation gaps in AI training data.