🧠 AI⚪ NeutralImportance 6/10

According to Me: Long-Term Personalized Referential Memory QA

arXiv – CS AI|Jingbiao Mei, Jinghong Chen, Guangyu Yang, Xinyu Hou, Margaret Li, Bill Byrne|March 3, 2026 at 05:00 AM|10 views

🤖AI Summary

Researchers introduce ATM-Bench, the first benchmark for evaluating AI assistants' ability to recall and reason over long-term personalized memory across multiple modalities. The benchmark reveals poor performance (under 20% accuracy) for current state-of-the-art memory systems, highlighting significant limitations in personalized AI capabilities.

Key Takeaways

→ATM-Bench is the first multimodal benchmark for testing AI assistants' long-term personalized memory capabilities across images, videos, and emails.
→Current state-of-the-art memory systems achieve less than 20% accuracy on the challenging ATM-Bench-Hard dataset.
→The benchmark includes four years of privacy-preserving personal memory data with human-annotated question-answer pairs.
→Schema-Guided Memory (SGM) outperforms traditional Descriptive Memory approaches used in previous research.
→The research exposes critical gaps in AI systems' ability to handle personalized references and multi-source reasoning.