y0news
← Feed
←Back to feed
🧠 AIβšͺ NeutralImportance 6/10

Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction

arXiv – CS AI|Chris Sainsbury, Feng Dong, Andreas Karwath|
πŸ€–AI Summary

Researchers applied sparse autoencoders to a clinical sequence model trained on electronic health records, revealing how the model abstracts medical information across layers. While SAE features outperformed dense representations for mortality prediction in full-sequence settings, dense representations proved superior in clinically relevant scenarios with temporal constraints, suggesting interpretability gains may not translate to practical clinical improvements.

Analysis

This research addresses a critical gap in mechanistic interpretability for clinical AI systems. Sparse autoencoders have gained traction in understanding large language models, but their application to clinical foundation models remained unexplored until this study. The researchers trained TopK SAEs on FlatASCEND, a 14.5-million-parameter clinical model, across both outpatient and ICU datasets, discovering progressive complexity in learned representations from layer zero to layer six.

The findings reveal a representational trade-off with significant implications for clinical AI deployment. While SAE-decomposed features achieved marginally better performance on mortality prediction (0.871 AUC on eICU), this advantage vanished under clinically realistic constraints where temporal leakage is controlled. Dense representations consistently outperformed SAE features in these leakage-safe windows (0.880 versus 0.871 on eICU, 0.914 versus 0.836 on MIMIC-IV), indicating that interpretability gains from sparse decomposition may come at the cost of predictive utility in real-world deployment scenarios.

The low feature reproducibility rate of 21% across random seeds raises important concerns about mechanistic interpretability claims. Individual features should be treated as illustrative rather than representing stable, discoverable patterns in clinical reasoning. The delta-mode intervention method reduced noise significantly but still failed to produce formally significant perturbation effects, suggesting that understanding causal feature importance in clinical models remains technically challenging.

These results have direct implications for clinical AI adoption. Organizations seeking to deploy interpretable models may need to accept performance trade-offs, or alternatively, rely on dense representations with post-hoc explanation methods rather than intrinsically interpretable sparse decompositions. Future work should focus on closing the performance gap between sparse and dense approaches in clinical settings.

Key Takeaways
  • β†’Sparse autoencoders show interpretability benefits but underperform dense models in clinically realistic temporal settings
  • β†’Feature reproducibility is only 21% across random seeds, limiting confidence in mechanistic interpretability claims
  • β†’Progressive abstraction occurs across transformer depth, from token detection at layer 0 to multi-category concepts at layer 6
  • β†’Dense representations consistently outperform SAE features when temporal leakage is properly controlled
  • β†’Delta-mode interventions reduce noise 86x but fail to demonstrate formal significance in perturbation effects
Read Original β†’via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains β€” you keep full control of your keys.
Connect Wallet to AI β†’How it works
Related Articles