AINeutralarXiv β CS AI Β· 8h ago7/10
π§
Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations
Researchers introduce CarryOnBench, a new interactive benchmark that evaluates whether large language models can recover helpfulness when users clarify benign intent across multi-turn conversations while maintaining safety. Testing 14 models with nearly 24,000 responses reveals that models significantly withhold information due to intent misinterpretation rather than knowledge limitations, and identifies three failure modesβutility lock-in, unsafe recovery, and repetitive recoveryβthat single-turn safety evaluations miss.