ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs
ProMed introduces a reinforcement learning framework that transforms medical LLMs from reactive to proactive systems, using Shapley Information Gain to guide intelligent clinical questioning. The approach achieves 54.45% improvement over baseline reactive models and demonstrates strong generalization across medical benchmarks.
ProMed addresses a critical limitation in current medical AI systems: their tendency to provide answers before gathering sufficient clinical context. This reactive behavior mirrors inexperienced practitioners rushing to diagnosis rather than conducting thorough patient interviews. The framework's innovation lies in its Shapley Information Gain reward mechanism, which mathematically quantifies how much value a question adds by considering both the information it reveals and its contextual importance within the clinical decision space.
The research builds on longstanding principles in medical education and game theory. Physicians have always prioritized systematic information gathering, and Shapley values—borrowed from cooperative game theory—provide a principled way to assign credit to individual questions based on their marginal contribution to diagnostic certainty. This represents a meaningful advance beyond simple information-theoretic metrics that treat all questions equally.
The practical implications for healthcare AI are substantial. A 54.45% performance gain over reactive baselines translates to more reliable clinical decision support, potentially reducing diagnostic errors caused by incomplete information. The two-stage training pipeline—using Monte Carlo Tree Search for initialization and then SIG-guided optimization—demonstrates sophisticated reinforcement learning engineering that balances exploration with targeted improvement.
The robustness to out-of-domain cases suggests the framework captures generalizable principles about clinical inquiry rather than overfitting to specific training conditions. For healthcare organizations deploying medical AI, ProMed signals that LLM capabilities depend critically on interaction paradigms, not just model scale. This shifts the focus toward behavioral training rather than pure architectural improvements.
- →ProMed achieves 54.45% performance improvement by training medical LLMs to ask questions proactively rather than react passively.
- →Shapley Information Gain quantifies clinical value of questions by measuring both new information and contextual importance simultaneously.
- →Two-stage training pipeline combines Monte Carlo Tree Search initialization with SIG-augmented policy optimization for superior results.
- →Framework demonstrates 6.29% average improvement over state-of-the-art methods and generalizes reliably to out-of-domain medical scenarios.
- →The approach addresses fundamental limitation in current medical AI: premature decision-making without sufficient patient information gathering.