AINeutralarXiv โ CS AI ยท 6h ago2
๐ง
According to Me: Long-Term Personalized Referential Memory QA
Researchers introduce ATM-Bench, the first benchmark for evaluating AI assistants' ability to recall and reason over long-term personalized memory across multiple modalities. The benchmark reveals poor performance (under 20% accuracy) for current state-of-the-art memory systems, highlighting significant limitations in personalized AI capabilities.