AINeutralarXiv – CS AI · 15h ago6/10
🧠
Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information
Researchers propose π-Soft-NC and π-Soft-NS, improved evaluation metrics for assessing input attribution methods in large language models that control for the number of retained words, addressing a fundamental bias in existing faithfulness evaluation frameworks. They also introduce Grad-ELLM, a gradient-based attribution method designed for decoder-only LLMs that combines gradient and attention mechanisms for stronger explanatory performance.
🧠 Llama