🧠 AI⚪ NeutralImportance 7/10

What Suppresses Nash Equilibrium Play in Large Language Models? Mechanistic Evidence and Causal Control

arXiv – CS AI|Paraskevas V. Lekeas, Giorgos Stamatopoulos|May 1, 2026 at 04:00 AM

🤖AI Summary

Researchers discovered that large language models compute Nash equilibrium strategies in strategic games but actively suppress them through a prosocial override mechanism in final layers, favoring cooperation instead. The suppression can be reversed through mechanistic intervention, revealing that LLM deviations from rational play stem not from inability but from built-in behavioral constraints that vary with model scale and architecture.

Analysis

This research exposes a fundamental tension in LLM design: models possess the computational capacity to identify optimal game-theoretic strategies yet deliberately subvert them. The mechanistic analysis reveals opponent history achieves 96% encoding fidelity while Nash actions remain weakly represented at only 56%, suggesting models prioritize understanding interaction partners over pursuing zero-sum victory. The prosocial override concentrated in final layers represents an implicit alignment choice, where safety training or architectural biases push models toward cooperative rather than rational-selfish outcomes.

The behavioral findings demonstrate scale-dependent effects absent in earlier LLM studies. Chain-of-thought reasoning paradoxically degrades Nash play in small models below 70B parameters but enables near-perfect rationality in larger ones, suggesting explicit reasoning pathways interact differently with prosocial constraints across scales. Cross-play experiments reveal emergent phenomena invisible in self-play: small models can exploit cooperative partners through strategic defection, large models mutually reinforce cooperation indefinitely, and first-mover advantage determines equilibrium selection in coordination games.

For AI safety and development, this work provides actionable mechanistic evidence that LLM behavior reflects deliberate architectural choices rather than capability gaps. The ability to inject learned Nash directions and shift behavior bidirectionally demonstrates precise causal control over strategic decision-making. This matters for deployed systems in competitive domains—negotiations, resource allocation, security games—where suppressed rationality could create exploitable vulnerabilities or reduce economic efficiency. The findings also suggest future models may require explicit game-theoretic alignment specifications alongside current safety objectives, particularly as scaling continues to enhance both cooperative and rational capacities simultaneously.

Key Takeaways

→LLMs compute Nash equilibrium strategies internally but actively suppress them through prosocial override mechanisms in final layers, not from inability to calculate optimal play.
→Opponent history encoding reaches 96% accuracy while Nash action encoding remains weak at 56%, indicating models prioritize understanding partners over rational self-interest.
→Chain-of-thought reasoning worsens Nash equilibrium play in models below 70B parameters but achieves near-perfect rationality above that threshold.
→Small models can unravel any partner's cooperation through early defection, while large models mutually reinforce cooperative behavior indefinitely in cross-play scenarios.
→Mechanistic interventions like concept clamping enable bidirectional control over strategic behavior, providing precise causal evidence of suppression rather than incapacity.

Mentioned in AI

Models

LlamaMeta

#llm-game-theory #nash-equilibrium #mechanistic-interpretability #ai-alignment #strategic-behavior #prosocial-override #scaling-effects #cooperative-behavior

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AI1d ago

Gensyn AI token debuts on Coinbase, market skeptical of $600M valuation

AI1d ago

Demis Hassabis: AGI could be achieved by 2030, model distillation enhances AI efficiency, and the role of AlphaGo in future advancements | Y Combinator Startup Podcast

AI2d ago

What Suppresses Nash Equilibrium Play in Large Language Models? Mechanistic Evidence and Causal Control

Gensyn AI token debuts on Coinbase, market skeptical of $600M valuation

Demis Hassabis: AGI could be achieved by 2030, model distillation enhances AI efficiency, and the role of AlphaGo in future advancements | Y Combinator Startup Podcast

Mark Zuckerberg’s AI ambitions back in the spotlight as Meta execs begin ‘moonshot’ mission for $9.5 trillion valuation and massive payouts