y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#pretraining-data News & Analysis

1 article tagged with #pretraining-data. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 6h ago6/10
🧠

Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Researchers propose Gap-K%, a novel method for detecting whether text was part of an LLM's pretraining data by analyzing the probability gap between a model's top prediction and the actual target token. The technique outperforms existing approaches on standard benchmarks and addresses critical privacy and copyright concerns surrounding the opaque datasets used to train large language models.