MemFail: Stress-Testing Failure Modes of LLM Memory Systems
Researchers introduce MemFail, a diagnostic benchmark for testing failure modes in LLM memory systems by isolating three core operations: summarization, storage, and retrieval. The benchmark evaluates state-of-the-art memory systems across five adversarially-designed datasets to empirically understand architectural tradeoffs, moving beyond aggregate accuracy metrics.