AIBearisharXiv – CS AI · 7h ago7/10
🧠
NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models
Researchers introduce NumLeak, a framework revealing that frontier large language models memorize public numeric benchmarks from pretraining data rather than genuinely understanding underlying concepts. The study demonstrates that models achieve near-perfect recall on financial and economic metrics when prompted with dates, but this performance collapses on recent holdout data, indicating memorization rather than reasoning capability.