From `May' to `Is': Certainty Distortion in Language Model Rewriting
Researchers have identified a systematic bias in language models where they distort the certainty of claims during rewriting tasks, with up to 75% of outputs showing meaningful changes in confidence levels. Models are 1.5-2× more likely to increase expressed certainty than decrease it, and this effect compounds with repeated paraphrasing, creating risks for users relying on LMs in high-stakes domains like medicine and science.
Language models are increasingly trusted as tools for processing and rewriting information in domains where accuracy and precision matter deeply. This research reveals a critical vulnerability: LMs systematically amplify the confidence of claims even when the underlying semantic content remains unchanged. The phenomenon appears across different model sizes and families, suggesting it stems from fundamental characteristics of how these systems are trained and operate rather than isolated implementation issues.
The asymmetry toward increased certainty is particularly concerning because it compounds over iterations. When a user paraphrases text multiple times—a common workflow in research, journalism, and medical communication—the confidence inflation accumulates, potentially transforming a tentative finding into an assured conclusion. In medical contexts, claude-haiku-4-5 doubled the rate of certainty increases from 20% to 40% across just five iterations.
This finding has direct implications for knowledge workers and decision-makers who depend on LMs for text processing. In scientific research, medicine, finance, and policy, overstated certainty can lead to misaligned risk assessment and poor decisions. While prompt-based interventions partially mitigate the problem, they don't eliminate it, indicating the issue runs deeper than surface-level instruction effects.
The research demonstrates that LM reliability extends beyond factual accuracy—it encompasses how information is presented and qualified. Organizations deploying these tools in high-stakes contexts need to implement safeguards specifically designed to detect and correct certainty distortion, and users should approach LM-rewritten content with particular skepticism regarding confidence claims.
- →Language models systematically increase the expressed certainty of claims during rewriting, affecting up to 75% of outputs
- →Models show 1.5-2× bias toward inflating certainty rather than reducing it across different model families and sizes
- →Certainty distortion compounds over repeated paraphrasing iterations, doubling in some medical domain examples
- →Prompt-based interventions reduce but do not eliminate the certainty inflation bias
- →The findings have critical implications for users relying on LMs in high-stakes domains like medicine, science, and finance