🧠 AI⚪ NeutralImportance 6/10

How Do Language Models Compose Functions?

arXiv – CS AI|Apoorv Khandelwal, Ellie Pavlick|May 11, 2026 at 04:00 AM

🤖AI Summary

Researchers investigate how large language models solve compositional tasks, revealing that LLMs employ two distinct mechanisms—compositional and direct—rather than consistently breaking problems into intermediate steps. The study demonstrates that embedding space geometry determines which mechanism dominates, with direct solving more prevalent when tasks align with translation patterns in embedding spaces.

Analysis

This research addresses a fundamental question about LLM reasoning: whether models that appear compositionally capable actually use compositional mechanisms internally. The study focuses on two-hop factual recall tasks expressed as g(f(x)), where an LLM must compute an intermediate function f(x) to then apply g. The persistence of the 'compositionality gap'—where models can solve individual functions but struggle with their composition—suggests reasoning shortcuts rather than genuine step-by-step logic.

The finding that LLMs employ dual mechanisms has significant implications for understanding model behavior. When embedding spaces naturally represent tasks as direct translations from input to output, models exploit this geometry rather than decomposing problems. This suggests that apparent compositional reasoning may reflect data statistics and geometric properties rather than learned compositional principles. The research indicates that LLM capabilities depend heavily on how information structures align with underlying embedding spaces.

For AI developers and researchers, these insights challenge assumptions about model reasoning transparency. If models switch between compositional and direct solving based on geometric properties, mechanistic interpretability efforts must account for this flexibility. This affects how we design training procedures, interpret model outputs, and predict failure modes. Understanding when models employ which mechanism becomes crucial for building more reliable systems. The work also hints at potential training interventions—encouraging compositional representations through geometric constraints might improve systematic generalization, though this remains speculative.

Key Takeaways

→Large language models exhibit a compositionality gap where solving individual functions doesn't guarantee solving their composition.
→LLMs employ two distinct solving mechanisms: compositional (computing intermediate steps) and direct (computing final output without detectable intermediate stages).
→Embedding space geometry determines which mechanism models employ, with translation-based representations favoring direct solving.
→Apparent compositional reasoning may reflect data statistics and geometric properties rather than genuine step-by-step logic.
→Findings suggest mechanistic interpretability research must account for mechanism switching based on task geometry.