🧠 AI⚪ NeutralImportance 6/10

STORM: Stepwise Token Optimization with Reward-Guided Beam Search

arXiv – CS AI|Arthur Satouf, Giulio D'Erasmo, Yuxuan Zong, Habiboulaye Amadou Boubacar, Pablo Piantanida, Benjamin Piwowarski|June 10, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce STORM, a self-supervised framework that optimizes lexical query expansion for information retrieval by using BM25 reward signals during generation. The approach enables smaller language models (0.6B-8B parameters) to match larger proprietary rewriters while maintaining BM25's speed efficiency, and demonstrates zero-shot transfer across 18 languages.

Analysis

STORM addresses a fundamental tension in modern information retrieval: dense neural models offer superior relevance but require expensive index rebuilding, while lexical systems like BM25 remain fast and interpretable but struggle with vocabulary mismatch. The researchers solve this by embedding retrieval feedback directly into the generation process, treating BM25 scores as token-level supervision rather than delayed sequence-level rewards. This architectural innovation transforms the optimization landscape, allowing smaller models to learn which query expansions actually improve retrieval rather than which sound well-formed to a language model.

The approach emerges from growing recognition that LLM query rewriting, despite theoretical promise, often generates fluent but retrieval-ineffective text. By guiding generation through continuous beam-search pruning based on retrieval metrics, STORM concentrates model capacity on vocabulary that demonstrably helps lexical search. The framework's efficiency gains prove substantial: 8B-parameter models match competitive proprietary systems across TREC DL and BEIR benchmarks while maintaining BM25's near-instant retrieval speeds.

The zero-shot multilingual transfer to MIRACL's 18 languages suggests the approach captures generalizable principles about effective query expansion rather than dataset-specific patterns. This capability challenges assumptions that dense multilingual retrievers provide necessary advantages. For organizations with existing BM25 infrastructure, STORM offers a path to state-of-the-art retrieval without expensive reindexing or proprietary model dependencies. The work signals a pragmatic shift toward hybrid systems that leverage neural optimization within lexical retrieval frameworks.

Key Takeaways

→STORM uses token-level BM25 rewards during beam search to train smaller models on retrieval-effective query expansion rather than language fluency alone
→0.6B-8B parameter models match or exceed larger proprietary rewriters while maintaining BM25's speed advantage on standard inverted indexes
→The framework demonstrates zero-shot transfer across 18 languages in MIRACL, outperforming dedicated multilingual dense retrievers on average
→Embedding retrieval metrics directly into generation transforms delayed sequence-level supervision into immediate token-level signals that guide exploration
→STORM provides infrastructure-light alternative to dense neural retrieval without requiring corpus reindexing when models change