DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification
DecomposeRL presents a novel reinforcement learning approach to claim verification that achieves high accuracy while maintaining interpretability through decomposition-based reasoning. A 7B parameter model trained on just 5K curated claims matches 32B baselines and GPT-4.1-mini across 11 benchmarks while enabling semi-supervised learning, demonstrating efficient scaling through intelligent data curation.
DecomposeRL addresses a fundamental tradeoff in NLP: end-to-end models deliver high accuracy but produce black-box outputs, while decomposition-based systems offer transparency at the cost of performance degradation. This research bridges that gap by combining reinforcement learning with strategic data curation, showing that quality trumps quantity in model training. The approach uses a multi-faceted reward ensemble trained via GRPO (Group Relative Policy Optimization) to guide the model toward generating verifiable reasoning chains alongside predictions.
The breakthrough emerges from aggressive data curation, reducing 115K claims to 5K high-signal examples that maintain learning effectiveness. This addresses a critical pain point in RL training—computational cost—making the approach more accessible to resource-constrained organizations. The model's performance across biomedical, political, scientific, and general domains demonstrates robustness beyond narrow specialization, suggesting the decomposition framework generalizes well.
For enterprises requiring transparent AI decision-making—particularly in regulated industries like healthcare, finance, and legal services—this represents meaningful progress. The semi-supervised capabilities enable practical deployment where labeled data remains scarce. The 4x efficiency gain relative to larger baselines has implications for deployment costs and carbon footprint, factors increasingly important in enterprise procurement decisions.
The research opens questions about whether similar curation-focused approaches apply to other complex NLP tasks requiring both accuracy and interpretability. Industry adoption likely depends on benchmarking against domain-specific claim verification datasets and evaluating the quality of generated reasoning traces in real-world applications.
- →DecomposeRL achieves 86.3% in-domain accuracy while maintaining inspectable reasoning traces, solving a key tradeoff between performance and interpretability
- →Strategic data curation reducing 115K claims to 5K high-signal examples enables efficient GRPO training and addresses prohibitive computational costs
- →A 7B model matches or exceeds 32B baselines and GPT-4.1-mini across 11 diverse claim-verification benchmarks
- →Semi-supervised learning capability allows effective training with only 10% labeled data, expanding practical applicability
- →Cross-domain performance spanning biomedical, political, scientific, and general claims demonstrates robust generalization