Confident but Conflicted: Internal Uncertainty and Cognitive Dissonance Resolution in LLMs
Researchers have developed Trust Elasticity (TE), a metric measuring how readily large language models change their outputs when presented with conflicting evidence. The study finds that internal uncertainty indicators—such as confidence miscalibration—correlate with behavioral variation in how different LLMs resolve cognitive dissonance, suggesting future AI safety interventions could target these measurable internal properties.
This research addresses a critical gap in understanding how LLMs respond to contradictory information. Rather than treating model behavior as a black box, the authors systematically examined the internal mechanisms underlying cognitive dissonance resolution—the process by which models encounter conflicting inputs and decide whether to maintain, adjust, or reject their previous outputs. By varying source authority and evidence quality across health-science claims, the team established a controlled framework for measuring persuasion resistance and susceptibility.
The introduction of Trust Elasticity provides a quantifiable lens borrowed from econometrics to assess how readily models shift positions. This metric revealed substantial cross-model variation, with clearly false claims showing near-zero elasticity across all tested models. The key finding links behavioral variation to measurable internal uncertainty proxies: Qwen models showed confidence miscalibration patterns while Llama demonstrated internal uncertainty change correlations.
For AI development and deployment, these insights matter considerably. Model reliability in critical domains—healthcare, finance, legal guidance—depends partly on understanding how susceptible systems are to manipulation or correction. The discovery that internal uncertainty indicators predict persuasion patterns opens pathways for targeted interventions, potentially improving robustness without sacrificing adaptability. This could enable developers to calibrate models for different risk contexts: conservative uncertainty handling for high-stakes applications versus more elastic updating for exploratory use cases.
Future work should examine whether similar internal mechanisms apply across diverse claim domains and whether external uncertainty interventions can effectively reduce undesired persuasion while preserving beneficial evidence incorporation. The research establishes that LLM behavior is not arbitrary but grounded in measurable internal properties, fundamentally changing how the field approaches model safety and reliability.
- →Trust Elasticity quantifies how readily LLMs abandon prior outputs when presented conflicting evidence, revealing substantial cross-model variation.
- →Internal uncertainty indicators like confidence miscalibration in Qwen and uncertainty change in Llama correlate with behavioral susceptibility to persuasion.
- →Clearly false claims elicit near-zero persuasion elasticity across all models tested, suggesting baseline resistance to obvious misinformation.
- →The research bridges behavioral observations and internal model properties, enabling future interventions targeting measurable uncertainty metrics rather than external factors alone.
- →Understanding cognitive dissonance resolution mechanisms has direct implications for LLM reliability in high-stakes domains like healthcare and finance.