AIBearisharXiv – CS AI · 7h ago7/10
🧠
EUDAIMONIA: Evaluating Undesirable Dynamics in AI
Researchers introduce EUDAIMONIA, a benchmark testing whether large language models maintain healthy social dynamics with users. Evaluating 22 recent LLMs including Claude-Opus-4.7 and GPT-5.5, they find even the strongest models violate 30.7% and 27.2% of social-alignment checks respectively, indicating persistent design flaws that extended thinking cannot resolve.
🧠 GPT-5🧠 Claude