🧠 AI⚪ NeutralImportance 6/10

Show, Don't TELL: Explainable AI-Generated Text Detection

arXiv – CS AI|Aldan Creo, Suraj Ranganath|May 28, 2026 at 04:00 AM

🤖AI Summary

Researchers have developed TELL, an AI-generated text detector that prioritizes explainability by showing users the specific linguistic markers indicating AI or human authorship rather than just providing an opaque numerical score. The system achieves competitive detection performance (AUROC 0.927) while generating human-evaluated explanations with a 72.3% mean win-rate across quality metrics, fundamentally reframing detection as a human-centric interpretability problem.

Analysis

The detection of AI-generated text has become increasingly important as language models proliferate, yet existing systems suffer from a critical usability gap: they provide confidence scores without actionable reasoning. TELL addresses this fundamental mismatch between technical capability and practical utility by embedding explainability into the detection architecture itself rather than treating it as an afterthought.

The research builds on growing recognition that black-box AI systems fail institutional needs. Academic integrity officers, content moderators, and journalists require not just predictions but defensible rationales. TELL trains on domain-specific authorship annotations and refines performance through GRPO with curriculum learning—techniques that emphasize quality over raw accuracy metrics.

The system's competitive AUROC of 0.927 demonstrates that explainability need not compromise detection accuracy. More significantly, the 72.3% human evaluation win-rate on annotation quality (concreteness, falsifiability, coherence, plausibility, grounding) suggests the explanations genuinely inform user decision-making rather than serve as window dressing. This shifts responsibility appropriately: the tool aids human judgment rather than replacing it.

For the broader AI industry, TELL exemplifies a critical shift toward trustworthiness. Educational institutions, publishing platforms, and compliance teams increasingly demand interpretable AI systems. This research validates that native explainability—built into models from training rather than bolted on afterward—produces superior results. The methodology could influence how future detection systems, and potentially other high-stakes classifiers, are architected. Organizations deploying content authentication infrastructure should evaluate whether explainability requirements align with their user needs.

Key Takeaways

→TELL achieves 0.927 AUROC detection performance while natively generating human-evaluable explanations for its classifications.
→Explainability-first architecture proves competitive with state-of-the-art detectors, challenging the accuracy-versus-interpretability tradeoff assumption.
→Human evaluation shows 72.3% win-rate on explanation quality metrics, indicating genuine utility for end users rather than superficial transparency.
→The system reframes AI detection as a human-centric interpretability problem rather than pure technical classification challenge.
→Domain-specific SFT training and GRPO with curriculum learning emerge as effective techniques for producing both accurate and explainable detection.