AINeutralarXiv – CS AI · 6h ago6/10
🧠
DataDignity: Training Data Attribution for Large Language Models
Researchers introduce DataDignity, a new framework for attributing large language model outputs to specific training documents. The study presents FakeWiki, a benchmark of 3,537 fabricated Wikipedia articles designed to test provenance tracking, and proposes ScoringModel, a supervised contrastive ranker that improves document attribution accuracy from 35% to 52.2% recall compared to existing baselines.