AINeutralarXiv – CS AI · 10h ago6/10
🧠
Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents
Nautilus Compass is a black-box persona drift detector for LLM coding agents that operates without access to model weights, making it compatible with closed APIs like Claude and GPT-4. The system detects when production agents forget user constraints or contradict prior agreements using embedding-based similarity matching, achieving 0.83 ROC AUC on drift detection while costing $3.50 per evaluation—substantially cheaper than alternatives.
🧠 GPT-4🧠 Claude