y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#text-corpora News & Analysis

1 article tagged with #text-corpora. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 7h ago6/10
🧠

SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics

SPECTRA is a new framework for generating synthetic text corpora and retrieval test collections at scale, enabling researchers to stress-test information retrieval systems without expensive human annotation. The system can produce corpora up to 60,000 documents while maintaining controllable vocabulary distributions and deterministic relevance labels, serving as a diagnostic complement to traditional evaluation methods.