AIBearisharXiv โ CS AI ยท 5h ago
๐ง
$\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge
Researchers introduced ฯ-Knowledge, a new benchmark for evaluating AI conversational agents in knowledge-intensive environments, specifically testing their ability to retrieve and apply unstructured domain knowledge. Even frontier AI models achieved only 25.5% success rates when navigating complex fintech customer support scenarios with 700 interconnected knowledge documents.