AINeutralarXiv – CS AI · Apr 206/10
🧠
Applied Explainability for Large Language Models: A Comparative Study
Researchers compare three explainability techniques—Integrated Gradients, Attention Rollout, and SHAP—for interpreting LLM decisions on sentiment classification tasks. The study reveals that gradient-based methods offer stability and interpretability, while attention-based approaches are faster but less predictive, highlighting critical trade-offs in choosing explanation methods for transformer models.