AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks
ATANT v1.1 is a companion paper clarifying how existing memory and context evaluation benchmarks (LOCOMO, LongMemEval, BEAM, MemoryBench, and others) fail to measure 'continuity' as defined in the original v1.0 framework. The analysis reveals that existing benchmarks cover a median of only 1 out of 7 required continuity properties, and the authors demonstrate a significant measurement gap through comparative scoring: their system achieves 96% on ATANT but only 8.8% on LOCOMO, proving these benchmarks evaluate different capabilities.