AINeutralarXiv – CS AI · 18h ago6/10
🧠
Explaining Black-Box Language Models: Learning to Optimize Linguistically-Structured Word Subsets
Researchers propose a novel method for explaining black-box language model predictions by identifying linguistically-structured word subsets without requiring access to internal model parameters or gradients. The approach uses reinforcement learning and graph-based linguistic knowledge to generate interpretable, efficient explanations that outperform existing methods across multiple architectures and datasets.