AINeutralarXiv – CS AI · 9h ago6/10
🧠
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs
Researchers introduce PropMe, a framework that distinguishes between LLMs' capability to leak training data when directly attacked versus their propensity to do so during normal use. Testing on open models reveals a significant gap: while models can be forced to reproduce training data through adversarial prompts, they rarely do so voluntarily, suggesting memorization risk is lower in practical deployment than worst-case evaluations suggest.