Measuring Form and Function in Language Models
Researchers introduce Contextual Alternative Choice (CAC), a new evaluation method that measures both syntactic and functional properties of language models using metrics derived from child language acquisition studies. While some large language models approach human-level performance on these benchmarks, none trained on comparable data volumes simultaneously meet both formal and functional standards that children achieve early in development.