y0news
AnalyticsDigestsSourcesRSSAICrypto
#introspection1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 4d ago6/104
๐Ÿง 

Detecting the Disturbance: A Nuanced View of Introspective Abilities in LLMs

Researchers investigated whether large language models can introspect by detecting perturbations to their internal states using Meta-Llama-3.1-8B-Instruct. They found that while binary detection methods from prior work were flawed due to methodological artifacts, models do show partial introspection capabilities, localizing sentence injections at 88% accuracy and discriminating injection strengths at 83% accuracy, but only for early-layer perturbations.