Robust Explanations for User Trust in Enterprise NLP Systems
Researchers propose a black-box robustness evaluation framework for NLP explanations, revealing that decoder-based LLMs produce 73% more stable explanations than encoder models like BERT. The study establishes practical cost-robustness tradeoffs that help organizations select models for compliance-sensitive applications before deployment.
Enterprise NLP systems increasingly require explainability for regulatory compliance and user trust, yet organizations deploying black-box APIs lack tools to validate explanation stability before production. This research addresses a critical gap by developing a systematic framework to measure how explanations degrade under realistic perturbations—word swaps, deletions, shuffling, and back-translation—that simulate real-world noise. The framework uses leave-one-out occlusion to track token-level explanation changes, operationalizing robustness through top-token flip rates across severity levels. The empirical findings reveal a significant architectural divide: decoder LLMs substantially outperform encoders in explanation stability, with performance scaling predictably alongside model size. A 7B-to-70B parameter increase yields 44% stability gains, establishing a quantifiable relationship between computational cost and explanation robustness. This matters for regulated sectors where compliance officers require auditable, stable model decisions. Organizations migrating from encoder classifiers to decoder LLMs gain not only performance improvements but also explanation reliability advantages. The cost-robustness tradeoff curve provides decision-makers with concrete guidance on model selection balanced against inference expenses. The systematic cross-architecture evaluation across BERT, RoBERTa, Qwen, and Llama families on 64,800 test cases establishes credible baseline evidence. Future implications include adoption of similar robustness protocols in compliance-sensitive domains like healthcare, finance, and legal tech where explanation consistency directly impacts liability and trust.
- →Decoder LLMs produce 73% more stable explanations than encoder models under realistic perturbations
- →Model scale correlates with explanation robustness, with 44% improvement from 7B to 70B parameters
- →A black-box robustness evaluation framework enables pre-deployment validation without API access to model internals
- →Cost-robustness tradeoff curves guide practical model selection for compliance-sensitive applications
- →Explanation stability varies across perturbation types, requiring multi-angle robustness testing before production deployment