Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction
A large-scale study challenges the widespread assumption that fine-tuning language models with synthetic explanations improves clinical prediction performance. Researchers found that rationale-based supervised fine-tuning consistently degraded Alzheimer's disease prediction accuracy compared to label-only approaches, despite the rationales being medically accurate and human-verified.
This research exposes a critical disconnect between intuitive machine learning practices and real-world performance in high-stakes clinical applications. The study's scale—504 configurations tested systematically—provides robust evidence that a widely-adopted technique for improving model interpretability and reasoning actually undermines predictive accuracy in disease forecasting tasks.
The finding emerges from a growing trend where AI developers assume that teaching models to explain their reasoning improves overall performance. This assumption has driven significant investment in explainable AI (XAI) techniques, particularly for healthcare where transparency and interpretability are regulatory and ethical priorities. The longitudinal Alzheimer's disease prediction task represents exactly the kind of application where stakeholders demand both accuracy and explainability.
For clinical AI developers and healthcare institutions, this research signals that the path to trustworthy AI may not be straightforward. The paper identifies a structural conflict between narrative plausibility—what sounds medically coherent—and discriminative optimization, the mathematical objective of accurate prediction. This distinction matters profoundly: models learn to produce explanations that satisfy human expectations rather than focusing purely on predictive signals. Interestingly, the same rationales improved performance when used as in-context demonstrations during inference, suggesting the problem lies specifically in using them as training targets.
Looking forward, this work should catalyze more empirical investigation into when rationale-based supervision genuinely helps versus when it introduces performance penalties. The healthcare AI sector may need to decouple explainability efforts from the fine-tuning process itself, exploring separate mechanisms for generating trustworthy explanations post-hoc rather than as training objectives. This has implications for regulatory expectations around AI transparency in clinical settings.
- →Synthetic rationale fine-tuning consistently reduced prediction accuracy across 504 model configurations, contradicting widespread assumptions about its benefits.
- →The performance degradation persisted across different model families and data scales, indicating a fundamental rather than incidental problem.
- →Human experts verified that generated rationales were medically accurate and grounded in patient data, ruling out poor explanation quality as the cause.
- →The same rationales improved performance when used as in-context demonstrations during inference rather than as training targets.
- →Structural conflict between narrative plausibility and discriminative optimization explains why models optimize for explanatory coherence over predictive accuracy.