AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
Researchers introduce a multi-agent framework to map data lineage in large language models, revealing how post-training datasets evolve and interconnect. The analysis uncovers structural redundancy, benchmark contamination propagation, and proposes lineage-aware dataset construction to improve LLM training diversity and quality.