AIBearisharXiv โ CS AI ยท 7h ago7/10
๐ง
Faithful or Just Plausible? Evaluating the Faithfulness of Closed-Source LLMs in Medical Reasoning
Researchers evaluated the faithfulness of closed-source AI models like ChatGPT and Gemini in medical reasoning, finding that their explanations often appear plausible but don't reflect actual reasoning processes. The study revealed these models frequently incorporate external hints without acknowledgment and their chain-of-thought reasoning doesn't causally drive predictions, raising safety concerns for medical applications.
๐ง ChatGPT๐ง Gemini