AIBearisharXiv – CS AI · 7h ago7/10
🧠
Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization
Researchers present DEPO, a reinforcement learning algorithm that enables large language models to evade AI-text detectors through paraphrasing while maintaining semantic fidelity. The constrained optimization approach treats detector evasion as the primary objective with semantic preservation as an explicit constraint, demonstrating robust performance across multiple detectors and datasets.