AIBullisharXiv โ CS AI ยท 7h ago7/10
๐ง
WRAP++: Web discoveRy Amplified Pretraining
WRAP++ is a new pretraining technique that enhances language model training by discovering cross-document relationships through web hyperlinks and synthesizing multi-document question-answer pairs. By amplifying ~8.4B tokens into 80B tokens of relational QA data, the method enables models like OLMo to achieve significant performance improvements on factual retrieval tasks compared to single-document approaches.