y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10

In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification

arXiv – CS AI|Ming Liu|
🤖AI Summary

Researchers demonstrate that large language models suffer from 'in-context fixation,' where homogeneous demonstration labels—even semantically valid ones—cause classification accuracy to collapse below 12%. The models treat label-slot tokens as an exhaustive vocabulary set rather than learning from semantic meaning, revealing that in-context learning operates as constrained vocabulary retrieval rather than genuine concept learning.

Analysis

This research exposes a fundamental limitation in how large language models perform few-shot classification through in-context learning. Rather than inferring semantic patterns from demonstrations, models appear to mechanically bind outputs to the exact token inventory presented in examples, regardless of semantic plausibility. The findings challenge prevailing theories about in-context learning as Bayesian concept inference, suggesting instead that models engage in surface-level pattern matching constrained to demonstrated tokens.

The research builds on prior work questioning the robustness of ICL (Min et al., 2022) but provides mechanistic clarity through paired activation patching and logit lens analysis. The circuit localization to layer 7 in Pythia-1B and cross-architecture replication in Llama establishes this as a generalizable phenomenon across model scales and families. The decomposition into format-level and content-level fixation components offers actionable insights for understanding where breakdowns occur.

For AI practitioners and researchers, these findings carry significant implications. Current assumptions about ICL reliability in production systems may be overstated, particularly for tasks sensitive to label distribution shifts. The vocabulary-binding behavior suggests that few-shot performance depends critically on demonstration quality beyond semantic validity. Developers deploying language models for classification tasks should anticipate potential failures when label distributions differ from training demonstrations, even slightly.

Future work should investigate mitigation strategies—whether through prompt engineering, architectural modifications, or alternative learning paradigms. Understanding whether this fixation reflects fundamental model limitations or addressable training artifacts remains crucial for advancing reliable few-shot learning systems.

Key Takeaways
  • LLMs collapse classification accuracy below 12% when demonstration labels are homogeneous, revealing vocabulary-binding rather than semantic learning
  • Models constrain outputs to demonstrated token sets regardless of semantic validity, treating label-slot content as exhaustive answer vocabulary
  • Circuit analysis localizes fixation to layer-7-centered pathways with 98.4% recovery rate, indicating a specific mechanistic bottleneck
  • The effect generalizes across six models (0.8B–8B parameters), four tasks, and multi-token verbalizers, confirming robustness of the phenomenon
  • Findings challenge Bayesian latent-concept theories of ICL and suggest models engage in constrained vocabulary retrieval rather than genuine inference
Mentioned in AI
Models
LlamaMeta
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles