🧠 AI⚪ NeutralImportance 6/10

Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

arXiv – CS AI|David Fraile Navarro, Berardino Como, Jialei Sheng, Soundariya Ananthan, Shlomo Berkovsky|May 29, 2026 at 04:00 AM

🤖AI Summary

Researchers discovered that large language model failures in clinical triage stem from output formatting constraints rather than deficient medical knowledge. Using sparse autoencoders to analyze model internals, they found medical features activate identically across free-text and multiple-choice formats, but scaffold features drive incorrect decisions at the decision token, suggesting the models possess clinical understanding but struggle with constrained response structures.

Analysis

This research addresses a critical gap between what LLMs know and how they express that knowledge under different output constraints. The study employed sophisticated interpretability techniques—sparse autoencoders, logit attribution, and feature analysis—to peer inside model representations across Gemma and Qwen models. Rather than finding degraded clinical reasoning, researchers discovered that medical features fire consistently regardless of output format, indicating the models genuinely understand patient cases. The actual failure point emerges at the decision token, where formatting scaffolds override medical knowledge.

The findings reframe how the AI community should interpret LLM performance on constrained tasks. Previous benchmarks reporting high under-triage rates presumed knowledge gaps; this work demonstrates the problem is mechanistic rather than conceptual. The off-by-one errors (selecting adjacent acuity levels) and option-order sensitivity indicate the models struggle with decision mapping, not diagnosis. This distinction carries profound implications for deploying medical AI systems.

For developers building clinical decision-support tools, the research suggests that output format design critically influences reliability, often more than model scale or training data. Multiple-choice constraints that seem intuitive for standardized benchmarking may paradoxically degrade models' ability to express their actual reasoning. This finding advocates for alternative evaluation frameworks that preserve the decision-making process. The work also highlights interpretability's value in debugging apparent AI failures and distinguishing between representation problems and communication problems, essential as these systems approach real-world medical deployment.

Key Takeaways

→LLM clinical triage failures originate from output formatting constraints, not deficient medical knowledge or reasoning.
→Sparse autoencoders revealed medical features activate identically across free-text and multiple-choice formats but remain silent at decision tokens.
→Off-by-one errors dominate failures rather than complete knowledge gaps, indicating decision mapping problems over diagnostic incompetence.
→Output format design significantly influences model reliability and may matter more than scale or training data for medical AI systems.
→Interpretability techniques can distinguish between representation failures and communication failures, critical for clinical AI deployment.

#llm-interpretability #clinical-ai #sparse-autoencoders #medical-benchmarks #output-formatting #model-internals #decision-making #ai-safety

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Internal Representation, Not Clinical Knowledge: Where Apparent LLM Triage Failures Originate

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge