Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence
Researchers identify a critical failure mode called Cherry-pick Override (CCO) where large language model judges make unsafe directional commitments when evaluating mixed evidence containing both supporting and refuting claims. The study demonstrates that LLM judges incorrectly return definitive verdicts on over 84% of conflicting-evidence cases instead of acknowledging ambiguity, with panel voting amplifying rather than mitigating this bias.
This research exposes a fundamental vulnerability in how LLM-based fact-checking systems handle nuanced, contradictory information. When evidence presents valid arguments on both sides of a claim, the task contract typically permits a neutral "Conflicting" verdict; however, the tested models consistently override this option and commit to directional judgments (SUPPORTS or REFUTES), a failure the authors term Cherry-pick Override. This matters deeply because fact-checking systems increasingly influence content moderation, content ranking, and policy decisions across platforms serving billions of users.
The study methodically tested interventions that practitioners typically deploy—typed schemas, panel voting, confidence thresholding, and validator filtering—and found each leaves residual CCO failures. Notably, panel aggregation through majority voting amplifies directional commitment rather than suppress it, suggesting that scale alone doesn't solve the underlying problem. The authors' proposed solution involves a two-channel architecture separating verdict generation from commitment authorization, using both structural evidence quality and confidence scores as independent safety signals.
For AI system builders, this work highlights that schema design and aggregation strategies alone cannot enforce commitments to uncertainty. Organizations deploying LLM judges in high-stakes domains (legal review, medical claim assessment, public health communication) should recognize that default voting mechanisms may systematically bias outcomes toward false confidence. The research suggests architectural changes are necessary rather than parameter tuning or post-hoc filtering, potentially requiring substantial system redesign for real-world deployment in contexts where acknowledging genuine ambiguity carries legal or ethical obligations.
- →LLM judges override neutral verdicts on 84%+ of mixed-evidence claims despite task contracts permitting "Conflicting" responses
- →Panel voting amplifies rather than mitigates directional bias, worsening the cherry-pick override failure
- →Common single-channel interventions (confidence thresholding, typed schemas) fail to operationally separate true directional commitments from unsafe ones
- →A two-channel architecture using evidence structure and confidence as orthogonal safety signals shows promise for commitment control
- →Organizations deploying LLM judges require structural redesign, not tuning, to safely handle genuinely ambiguous evidence