AINeutralarXiv β CS AI Β· 7h ago6/10
π§
Applied Explainability for Large Language Models: A Comparative Study
Researchers compare three explainability techniquesβIntegrated Gradients, Attention Rollout, and SHAPβfor interpreting LLM decisions on sentiment classification tasks. The study reveals that gradient-based methods offer stability and interpretability, while attention-based approaches are faster but less predictive, highlighting critical trade-offs in choosing explanation methods for transformer models.