AIBullisharXiv – CS AI · 8h ago7/10
🧠
A Monosemantic Attribution Framework for Stable Interpretability in Clinical Neuroscience Transformer-Based Language Models
Researchers have developed a monosemantic attribution framework to improve interpretability of Transformer-based language models in clinical applications, particularly for Alzheimer's disease diagnosis. The framework addresses instability in existing attribution methods by reducing inter-method variability and providing stable, explicit importance scores for model predictions.