AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection
Researchers introduce AEyeDE, an attention-based attribution framework that detects AI-generated text by analyzing transformer model attention patterns rather than surface-level linguistic features. The method uses a lightweight CNN trained on attention maps from a proxy model and demonstrates strong performance across multiple settings, suggesting attention structures provide a reliable signal for distinguishing human from AI authorship.
The emergence of sophisticated language models has created an urgent need for robust detection mechanisms, as current likelihood-based and statistical approaches increasingly fail against improved AI systems. AEyeDE addresses this challenge through a novel angle: rather than analyzing text itself, the framework examines how transformer models internally process information via attention mechanisms. This represents a meaningful shift in detection philosophy, moving from surface-level signals to interpretable model behavior patterns.
The research fits within a broader recognition that AI detection requires constant adaptation as generative models improve. Traditional detectors struggle because they target artifacts that sophisticated models can naturally avoid. By leveraging attention-based attribution matrices—essentially fingerprints of how models distribute computational focus across input tokens—AEyeDE exploits patterns that appear inherent to AI processing rather than easily correctable outputs. The method's robustness across encoder-decoder and decoder-only architectures, combined with its resilience to spelling perturbations and cross-dataset transfer, suggests genuine generalizability rather than superficial pattern matching.
For content platforms, AI service providers, and academic integrity systems, this work provides a technical foundation for more reliable detection infrastructure. The framework's interpretability also addresses a critical gap: detection systems that can explain *why* text appears AI-generated carry legal and practical advantages over black-box classifiers. The identification of recurring local structures in attention maps that differ systematically between human and AI text opens avenues for further research into attention-based forensics. As a detection approach deployed at model inference time, AEyeDE could enable real-time content screening without requiring external detection services.
- →AEyeDE detects AI-generated text through attention pattern analysis rather than surface-level linguistic features, providing better robustness against sophisticated models.
- →The method leverages interpretable attention maps as a discriminative signal, making detection results explainable rather than opaque.
- →Strong performance across encoder-decoder and decoder-only architectures with cross-dataset transfer suggests the approach captures fundamental differences in how AI processes information.
- →Recurring local structures in attention maps differ consistently between human and AI-generated text, opening new research directions for attention-based forensics.
- →Public code release supports adoption and further development of attention-based detection methods across the research community.