AINeutralarXiv – CS AI · 7h ago6/10
🧠
STORM: Stepwise Token Optimization with Reward-Guided Beam Search
Researchers introduce STORM, a self-supervised framework that optimizes lexical query expansion for information retrieval by using BM25 reward signals during generation. The approach enables smaller language models (0.6B-8B parameters) to match larger proprietary rewriters while maintaining BM25's speed efficiency, and demonstrates zero-shot transfer across 18 languages.