🧠 AI⚪ NeutralImportance 6/10

Nautilus Compass: Black-box Persona Drift Detection for Production LLM Agents

arXiv – CS AI|Chunxiao Wang|May 12, 2026 at 04:00 AM

🤖AI Summary

Nautilus Compass is a black-box persona drift detector for LLM coding agents that operates without access to model weights, making it compatible with closed APIs like Claude and GPT-4. The system detects when production agents forget user constraints or contradict prior agreements using embedding-based similarity matching, achieving 0.83 ROC AUC on drift detection while costing $3.50 per evaluation—substantially cheaper than alternatives.

Analysis

Nautilus Compass addresses a practical pain point in production LLM agent deployment: behavioral consistency degradation over extended sessions. As LLM agents become integral to development workflows, ensuring they maintain user-specified constraints and remember prior decisions is critical for reliability and user trust. This work matters because it solves the constraint for developers using closed APIs, which represent the majority of production LLM interactions.

The technical approach represents a deliberate trade-off. By operating entirely at the prompt-text embedding layer without calling LLMs during indexing, Compass achieves dramatically lower costs ($3.50 versus systems requiring model inference) and faster deployment cycles. However, this architectural choice creates performance ceilings: the 56.6% score on LongMemEval-S is substantially below white-box approaches reaching 90%+. The developers transparently acknowledge this trade-off, treating it as an acceptable compromise for accessibility and cost.

For developers and organizations deploying coding agents in production, Compass offers immediate utility through multiple interfaces (Claude plugin, MCP server, CLI, REST API) and verifiable audit trails. The Merkle-chained audit log addresses security and compliance concerns around memory manipulation. Market impact remains niche—this targets a specific subset of developers building agent systems—but the work establishes that practical, affordable persona consistency is achievable without proprietary access.

The landscape forward involves whether similar architectural constraints become table stakes for cost-sensitive deployments, or whether the performance gap drives adoption toward costlier white-box solutions. Open-sourcing the codebase accelerates iterative improvement and establishes community standards for agent memory evaluation.

Key Takeaways

→Nautilus Compass enables persona drift detection for closed-API LLMs (Claude, GPT-4) without requiring model weight access, solving a deployment constraint for most production users.
→The embedding-based approach costs ~$3.50 per evaluation—14x cheaper than comparable systems—by avoiding LLM calls during indexing.
→Performance reaches 0.83 ROC AUC for drift detection but scores 30 percentage points below white-box baselines, representing an intentional cost-vs-accuracy trade-off.
→The system ships as multiple integrations (Claude plugin, MCP server, CLI, REST API) with Merkle-chained audit logs for tamper-evident memory updates.
→MIT-licensed code and frozen test data enable reproducibility and establish community benchmarks for evaluating agent memory consistency.

Mentioned in AI

Models

GPT-4OpenAI

ClaudeAnthropic