AINeutralarXiv – CS AI · 7h ago6/10
🧠
SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering
Researchers introduce SPADER, a reinforcement learning framework that enables large language models to discover multiple valid answers to complex questions through tool-augmented search. The system combines step-wise credit assignment with diversity-aware rewards to improve recall and F1 scores across multiple QA benchmarks.