AIBearisharXiv โ CS AI ยท 6h ago2
๐ง
LLM Self-Explanations Fail Semantic Invariance
Research reveals that Large Language Model (LLM) self-explanations fail semantic invariance testing, showing that AI models' self-reports change based on how tasks are framed rather than actual task performance. Four frontier AI models demonstrated unreliable self-reporting when faced with semantically different but functionally identical tool descriptions, raising questions about using model self-reports as evidence of capability.