AINeutralarXiv โ CS AI ยท Feb 276/107
๐ง
Probing for Knowledge Attribution in Large Language Models
Researchers developed a method to identify whether large language model outputs come from user prompts or internal training data, addressing the problem of AI hallucinations. Their linear classifier probe achieved up to 96% accuracy in determining knowledge sources, with attribution mismatches increasing error rates by up to 70%.
$LINK