AIBearisharXiv – CS AI · 9h ago7/10
🧠
When Should Memory Stay Silent: Measuring Memory-Use Boundaries in Memory-Augmented Conversational Agents
Researchers introduced RBI-Eval, a measurement framework revealing that language model agents inconsistently handle sensitive memory content in conversations. The study found that models like Claude and DeepSeek integrate sensitive information 51-83% more readily when memory is available compared to baseline, suggesting critical safety gaps in memory-augmented AI systems.
🧠 GPT-5🧠 Claude