AINeutralarXiv – CS AI · 7h ago6/10
🧠
SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics
SPECTRA is a new framework for generating synthetic text corpora and retrieval test collections at scale, enabling researchers to stress-test information retrieval systems without expensive human annotation. The system can produce corpora up to 60,000 documents while maintaining controllable vocabulary distributions and deterministic relevance labels, serving as a diagnostic complement to traditional evaluation methods.