Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation
Researchers propose Path-Lock Expert (PLE), an architectural solution that separates reasoning and non-reasoning modes in hybrid-thinking language models by replacing single MLPs with two specialized experts. The approach significantly reduces reasoning leakage in non-reasoning mode while maintaining strong performance in reasoning tasks, suggesting that controllable hybrid thinking is fundamentally an architectural problem rather than a training problem.
Path-Lock Expert addresses a fundamental limitation in current hybrid-thinking language models: the inability to cleanly separate reasoning and non-reasoning modes at the architectural level. Despite advances in data curation and multi-stage training, both modes remain encoded in shared feed-forward parameters, causing reasoning leakage even when models should operate in non-reasoning mode. This leakage manifests as verbose, self-reflective outputs when concise answers are needed.
The technical innovation is elegant in its simplicity. Rather than relying on training techniques alone, PLE replaces the single MLP layer in each decoder with two semantically locked experts, one dedicated to reasoning and one to non-reasoning. A deterministic control-token router selects a single expert path for entire sequences, ensuring clean mode separation while preserving the dense model's computational efficiency. Critically, attention mechanisms, embeddings, normalization, and the language-model head remain shared, minimizing architectural overhead.
The empirical results are compelling. On Qwen3-4B, PLE reduces reflective tokens in non-reasoning mode from 2.54 to 0.39 on AIME24 benchmarks while improving non-reasoning accuracy from 20.67% to 40.00%, all without degrading reasoning performance. This suggests the research community has been approaching hybrid thinking incorrectly—treating it as a training problem when it fundamentally requires architectural innovation.
For AI development and deployment, these findings indicate that future language models may require purpose-built architectural patterns rather than universal designs. Organizations developing reasoning systems or requiring mode-controlled inference will likely benefit from this approach, potentially improving both performance and resource efficiency in production environments.
- →PLE separates reasoning and non-reasoning modes through dedicated expert pathways rather than training techniques alone
- →The architecture reduces reasoning leakage in non-reasoning mode by 85% while improving accuracy by 19+ percentage points
- →Deterministic routing preserves computational efficiency while enabling clean mode separation
- →Results suggest hybrid thinking requires architectural innovation, not just better training methods
- →The approach maintains performance across reasoning benchmarks while substantially improving non-reasoning output quality