AINeutralarXiv – CS AI · 6h ago6/10
🧠
Log-Likelihood, Simpson's Paradox, and the Detection of Machine-Generated Text
Researchers identify a critical flaw in machine-generated text detection: token-level likelihood signals vary inconsistently across a detector model's hidden space, causing Simpson's paradox that undermines existing detectors. They propose a learned local calibration method that dramatically improves detection performance, with calibrated variants achieving AUROC improvements from 0.63 to 0.85 on GPT-5.4 text.
🧠 GPT-5