AINeutralarXiv – CS AI · 14h ago6/10
🧠
Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate
Researchers discovered that large language model failures in clinical triage stem from output formatting constraints rather than deficient medical knowledge. Using sparse autoencoders to analyze model internals, they found medical features activate identically across free-text and multiple-choice formats, but scaffold features drive incorrect decisions at the decision token, suggesting the models possess clinical understanding but struggle with constrained response structures.