MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text
Researchers introduce MELD, an advanced AI-generated text detector that uses multi-task learning to improve robustness against adversarial attacks, transfer across unseen models and domains, and maintain low false-positive rates. The detector outperforms most open-source competitors and matches leading commercial systems on public benchmarks.
MELD addresses a critical gap in AI safety infrastructure as large language models become ubiquitous in writing workflows. Traditional binary classifiers optimize solely for separating human from AI text, leaving them vulnerable once that primary task saturates. MELD's innovation lies in enriching this binary objective with auxiliary supervision—training additional heads to identify generator families, attack types, and source domains while maintaining a single unified representation. This multi-task approach forces the shared encoder to learn structural patterns beyond simple human-AI distinction, creating more resilient feature representations.
The detector architecture employs several sophisticated techniques. Homoscedastic uncertainty weighting balances four competing loss functions dynamically, preventing any single task from dominating training. An EMA teacher-student framework with attack augmentation improves robustness by having a clean teacher guide a student trained on adversarially modified inputs. Hard-negative pairwise ranking enlarges margins between AI text and the most confusable human samples, addressing the critical low false-positive regime where practical deployment matters most.
MELD's performance demonstrates tangible improvements in deployment scenarios. Achieving 99.9% true positive rate at 1% false positive rate on recent models from major LLM providers suggests practical viability for content moderation at scale. The detector maintains standard computational cost at inference by discarding auxiliary heads, enabling drop-in replacement of existing systems.
For organizations deploying content moderation, academic integrity systems, or provenance tracking, MELD represents meaningful progress in adversarial robustness—a persistent challenge in AI safety. The approach generalizes to unseen generators without retraining, reducing operational friction in rapidly evolving LLM ecosystems.
- →MELD uses multi-task learning with uncertainty weighting to improve robustness beyond standard binary AI-text detection approaches
- →The detector achieves 99.9% true positive rate at 1% false positive rate on recent models, demonstrating practical deployment viability
- →Teacher-student distillation with adversarial augmentation significantly improves resistance to attack-based rewrites
- →Inference computational cost matches standard detectors despite richer training, enabling practical enterprise adoption
- →Strong generalization to unseen generators and domains reduces need for frequent retraining as new models emerge