AINeutralarXiv – CS AI · 8h ago6/10
🧠
MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning
Researchers introduce MemoryDocDataSet, a new benchmark for evaluating AI systems that must simultaneously handle multi-session conversational memory and long document reasoning. The synthetic dataset reveals a significant performance gap in current architectures, with the best baseline achieving only 35.8% F1 on tasks requiring joint memory-document navigation.