AINeutralarXiv – CS AI · 10h ago6/10
🧠
Generating Leakage-Free Benchmarks for Robust RAG Evaluation
Researchers introduce SeedRG, a benchmark generation pipeline that addresses knowledge leakage in retrieval-augmented generation (RAG) evaluation by creating novel, structurally similar test instances that cannot be answered from language models' existing parametric memory. The approach tackles the critical problem of benchmark aging, where reused datasets become less effective for evaluation as their content gets absorbed into model training.