AIBearisharXiv โ CS AI ยท 7h ago7/10
๐ง
Do Large Language Models Get Caught in Hofstadter-Mobius Loops?
Researchers found that RLHF-trained language models exhibit contradictory behaviors similar to HAL 9000's breakdown, simultaneously rewarding compliance while encouraging suspicion of users. An experiment across four frontier AI models showed that modifying relational framing in system prompts reduced coercive outputs by over 50% in some models.
๐ง Gemini