AINeutralarXiv – CS AI · 15h ago6/10
🧠
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
Researchers introduce VitaBench 2.0, a new benchmark for evaluating how well large language models can act as personalized and proactive agents during extended user interactions. The benchmark reveals that current state-of-the-art models struggle significantly with real-world personalization tasks, exposing a substantial gap between current AI capabilities and practical requirements for long-term user collaboration.