AINeutralarXiv – CS AI · 7h ago7/10
🧠
Same Patient, Different Words, Different Diagnosis? Evaluating Semantic Stability in Clinical LLMs
Researchers propose a semantic verification framework to evaluate robustness of clinical LLMs against prompt variations that preserve meaning. Testing 16 models reveals that domain-specific medical models show mixed results compared to general-purpose counterparts, with sensitivity to rephrasing posing safety risks in healthcare applications.