AINeutralarXiv – CS AI · 6h ago6/10
🧠
On the impact of retrieved content representations in RAG Pipelines
Researchers conducted a controlled study examining how retrieved documents should be formatted when fed into language models within RAG pipelines, rather than for human readers. Testing 14 different document representations across summarization, selection, and reformulation techniques, they found that answer retention—whether documents preserve answer-bearing content after transformation—is the primary driver of generation accuracy, while other factors like wording and length have minimal impact.