AINeutralarXiv – CS AI · 11h ago7/10
🧠
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
Researchers introduce WikiProfile, a benchmark that reframes LLM factuality failures as either missing knowledge or poor recall of encoded information. Analysis of 13 models shows frontier models encode 95-98% of facts but struggle significantly with recall, suggesting future improvements depend less on scaling and more on better knowledge access mechanisms.
🧠 GPT-5🧠 Gemini