🧠 AI⚪ NeutralImportance 6/10

When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning

arXiv – CS AI|Chenghao Qiu, Chunli Peng, Yufeng Yang, Kuan-Hao Huang, Yi Zhou|May 27, 2026 at 04:00 AM

🤖AI Summary

Researchers reveal that correct demonstrations in in-context learning don't guarantee improved model performance—some accurate examples actually degrade accuracy. The study introduces task-preserving perturbations to show that exemplar utility depends on how demonstrations influence contextual inference, not merely on correctness, challenging conventional assumptions about how AI models learn from examples.

Analysis

This research fundamentally challenges a core assumption in machine learning: that providing correct examples necessarily improves model performance. The team demonstrates that correctness and utility are decoupled concepts, with some accurate demonstrations actively harming in-context learning outcomes. The contextual evidence shift phenomenon they identify reveals that task-preserving perturbations—modifications that keep examples technically correct while changing input semantics—can substantially degrade performance, particularly affecting smaller models and complex tasks.

The work builds on growing recognition that in-context learning operates through mechanisms more complex than simple pattern matching. Previous research suggested models learn from demonstrations through various inference pathways, but this study quantifies how the composition and framing of examples fundamentally alter these pathways. The researchers employ a rigorous framework distinguishing between label-updating perturbations (where task-relevant semantics change) and target-preserving perturbations (where original targets remain valid), enabling systematic investigation of the correctness-utility gap.

For the AI development community, these findings have practical implications for prompt engineering and few-shot learning optimization. Teams cannot rely on simple correctness verification when selecting or constructing demonstrations; they must consider how examples influence the model's contextual inference mechanisms. The degradation effects are most pronounced in harder tasks and with higher perturbation ratios, suggesting that task complexity amplifies sensitivity to exemplar composition. This research directly impacts how practitioners design in-context learning systems, requiring more sophisticated demonstration selection strategies beyond accuracy validation alone.

Key Takeaways

→Correct demonstrations don't guarantee improved in-context learning performance—some accurate examples reduce model accuracy
→Contextual evidence shift explains how task-preserving perturbations degrade ICL by altering the evidence mixture used for inference
→Smaller models show higher sensitivity to demonstration quality variations than larger models
→Exemplar utility depends on how examples influence contextual inference mechanisms, not solely on correctness
→Robust in-context learning requires evaluating demonstration impact on inference pathways, not just accuracy validation

#in-context-learning #machine-learning #demonstrations #model-behavior #prompt-engineering #language-models #inference-mechanisms

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

When Correct Demonstrations Hurt: Rethinking the Role of Exemplars in In-Context Learning

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge