Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning
Researchers introduce Lung-R1, an LLM specialized in pulmonary disease diagnosis that integrates a structured knowledge graph (LungKG) containing 59,038 nodes and 164,308 edges to enable patient-specific diagnostic reasoning from electronic medical records. The model achieves state-of-the-art performance on diagnostic tasks, demonstrating that grounding LLMs with domain-specific knowledge graphs significantly improves clinical reasoning over general knowledge recall.
Lung-R1 addresses a critical limitation in applying large language models to medical diagnosis: the gap between general medical knowledge and patient-specific clinical reasoning. While LLMs have shown capability in answering medical knowledge questions, they struggle with integrating heterogeneous patient data, phenotypic variability, and disease overlap that characterize real diagnostic scenarios. The research introduces LungKG, a structured pulmonary knowledge graph with 15 entity types and 112 relation types, which becomes the foundation for training a specialized 14-billion parameter model through knowledge-constrained reasoning chains and reinforcement learning.
This work reflects a broader industry shift toward specialized AI systems that combine general language understanding with domain-specific knowledge structures. Rather than relying solely on training data to capture clinical relationships, the researchers explicitly model diagnostic relationships, enabling the model to perform explicit reasoning over patient evidence. The 20-system evaluation demonstrates measurable improvements, with Lung-R1-14B outperforming comparable baselines on diagnostic tasks.
For healthcare AI development, this approach offers a replicable template for other medical specialties and disease domains. The emphasis on EMR-grounded diagnosis suggests clinical adoption pathways, as hospital systems increasingly seek tools that integrate with existing record systems. Knowledge graph construction requires domain expertise but creates reusable assets that benefit future model iterations. The research validates that specialized medical LLMs outperform general-purpose models on clinical tasks, likely accelerating investment in vertical-specific language models across healthcare.
- βLungKG is the first structured pulmonary knowledge graph containing 59,038 nodes and 164,308 edges designed for diagnostic reasoning.
- βLung-R1 achieves state-of-the-art performance on pulmonary diagnosis benchmarks by grounding LLM reasoning in structured medical knowledge.
- βKnowledge graph-guided training through constraint-based chain construction and reinforcement learning proves more effective than standard LLM approaches for clinical diagnosis.
- βThe methodology establishes a replicable template for developing specialized medical LLMs in other disease domains and specialties.
- βPatient-specific diagnostic reasoning from EMRs requires relation-aware evidence integration, not isolated knowledge recall from training data.