Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information
Researchers identify a critical failure mode in large reasoning models where they detect insufficient information but still produce unsupported answers instead of abstaining. The proposed Judge-Then-Solve (JTS) framework trains models to make explicit answerability commitments before reasoning, significantly improving safe abstention rates and inference efficiency.
Large language models demonstrate a paradoxical vulnerability: they can identify when problems lack sufficient information yet proceed to generate confident but unfounded answers. This detection-to-abstention gap represents a fundamental disconnect between a model's awareness of incomplete data and its actual behavior, creating dangerous implications for high-stakes applications like medical diagnostics where incorrect answers cause more harm than honest refusal.
The Judge-Then-Solve framework addresses this by restructuring the reasoning pipeline. Rather than treating abstention as a final-answer formatting choice, JTS implements it as a control decision point occurring before solution generation. Models undergo supervised training to make answerability judgments upfront, then proceed only if conditions permit. Reinforcement learning with consistency and length-shaping rewards further refines this behavior, ensuring models terminate unproductive reasoning trajectories immediately.
This work carries significant implications for AI safety and deployment. The demonstrated push toward "Abstention@Detection" saturation—where detected insufficiency reliably leads to abstention—establishes a pathway for safer reasoning model deployment. By eliminating unnecessary downstream reasoning on unanswerable problems, JTS simultaneously improves both safety and computational efficiency. The observation that missing-premise training reduces unproductive self-reflection on difficult but answerable questions suggests these techniques enhance reasoning quality across problem categories.
For organizations deploying reasoning models in regulated domains, JTS represents a foundational control mechanism. Future development likely focuses on generalizing these techniques across model architectures and scaling to longer reasoning chains where the cost of abandoned reasoning becomes more pronounced.
- →Large reasoning models can detect missing information but still generate unsupported answers, creating a detection-to-abstention gap
- →Judge-Then-Solve framework trains models to commit to answerability before reasoning, acting as an explicit control mechanism
- →JTS reduces harmful reasoning by terminating unanswerable trajectories early, improving both safety and inference efficiency
- →Missing-premise training improves abstention rates across datasets and reduces unproductive self-reflection on difficult problems
- →Abstention control emerges as a critical safety requirement for deploying reasoning models in high-risk domains