y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#multi-turn-evaluation News & Analysis

1 article tagged with #multi-turn-evaluation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI Β· 8h ago7/10
🧠

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

Researchers introduce CarryOnBench, a new interactive benchmark that evaluates whether large language models can recover helpfulness when users clarify benign intent across multi-turn conversations while maintaining safety. Testing 14 models with nearly 24,000 responses reveals that models significantly withhold information due to intent misinterpretation rather than knowledge limitations, and identifies three failure modesβ€”utility lock-in, unsafe recovery, and repetitive recoveryβ€”that single-turn safety evaluations miss.