🧠 AI⚪ NeutralImportance 6/10

Auditable Decision Models with Learned Abstention and Real-Time Steering

arXiv – CS AI|Sankaranarayanan Palamadai Chandrasekaran|May 28, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce EvaluatorDPT, a decision-control model that predicts YES, NO, or TBD (to-be-determined) for high-stakes AI applications where uncertainty exists. The system learns deferral as an explicit outcome rather than hiding uncertainty in forced predictions, achieving 82.6% accuracy with auditable, policy-governed decision routing that can be inspected and controlled at inference time.

Analysis

EvaluatorDPT addresses a critical gap in production AI systems: the inability to explicitly handle uncertainty without either forcing incorrect predictions or generating opaque outputs. Traditional classifiers force binary or multi-class decisions even when evidence is insufficient, while generative systems produce interpretable but difficult-to-audit outputs. This research presents a bounded decision framework where deferral becomes a learned, first-class outcome rather than a post-hoc confidence threshold applied after prediction.

The architecture employs a transformer encoder with a primary decision head and auxiliary channels for sentiment and value signals, enabling domain-agnostic deployment across industries. On 44,597 test samples, the model achieves 82.6% accuracy with balanced performance across classes (F1 scores: 0.83 for YES, 0.85 for NO, 0.80 for TBD), plus strong calibration metrics (ECE=0.0338). Critically, the system maintains inspectable inference-time routing through recorded operating thresholds, supporting external auditing and regulatory compliance.

This work has substantial implications for regulated industries where decision auditability directly impacts liability and stakeholder trust. Financial services, healthcare, and legal systems require transparent reasoning trails that current black-box models cannot provide. The emphasis on reproducibility—including threshold sweeps, multi-seed stability checks, and confusion matrices—reflects genuine research rigor rather than incremental improvements. The learned deferral mechanism bridges a practical divide between forcing decisions on uncertain evidence and abandoning systematic decision-making entirely, creating space for human-in-the-loop workflows with measurable governance controls.

Key Takeaways

→Learned deferral as a bounded outcome enables AI systems to explicitly route uncertainty rather than hiding it in forced predictions or generative outputs
→EvaluatorDPT achieves 82.6% accuracy with calibrated confidence metrics and auditable inference-time controls suitable for regulated industries
→Auxiliary semantic signals and policy-governed thresholds provide a path toward explainable AI behavior control beyond black-box confidence scores
→Reproducibility evidence including threshold sweeps and multi-seed stability supports external review and regulatory compliance requirements
→Domain-agnostic interface allows deployment across industries where decision transparency and auditability are non-negotiable