AIBearisharXiv โ CS AI ยท Mar 56/10
๐ง
$\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge
Researchers introduced ฯ-Knowledge, a new benchmark for evaluating AI conversational agents in knowledge-intensive environments, specifically testing their ability to retrieve and apply unstructured domain knowledge. Even frontier AI models achieved only 25.5% success rates when navigating complex fintech customer support scenarios with 700 interconnected knowledge documents.