🧠 AI🟢 BullishImportance 7/10

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

arXiv – CS AI|Liuyin Wang|June 10, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Engram, an open-source memory engine for LLM agents that achieves 83.6% accuracy on long-context tasks using only 9.6k tokens versus 79k for full-history baselines, demonstrating that selective retrieval outperforms exhaustive context replay while reducing computational costs by 8x.

Analysis

Engram addresses a critical limitation in large language model agents: the inability to maintain accurate long-term memory across sessions without storing entire conversation histories. Traditional approaches either lose information when sessions end or replay full histories, creating computational bottlenecks and paradoxically reducing accuracy as irrelevant details accumulate as distractors. The system's dual-process architecture separates concerns effectively—fast writes capture raw episodes without LLM involvement, while asynchronous processing extracts structured facts into a bi-temporal knowledge graph that preserves provenance chains and handles contradictions through invalidation rather than deletion.

Engram's performance gains stem from its hybrid read path, which intelligently fuses multiple signal types: dense embeddings, lexical matching, graph topology, and temporal recency. By applying point-in-time filtering, the system retrieves only contextually relevant information, reducing token consumption while paradoxically improving accuracy. The 10.4-point improvement over full-context baselines on LongMemEval_S, with McNemar statistical significance at p < 10^-6, indicates this represents genuine progress rather than measurement artifacts.

For the AI infrastructure industry, this work signals a maturation in memory systems engineering. Rather than competing primarily on cost or latency, Engram demonstrates that intelligent retrieval beats naive concatenation. The contribution extends beyond Engram itself: the authors publish a neutral evaluation harness with the official judge included, raw per-question logs, and reproduction commands. This emphasis on measurement integrity directly challenges benchmark inflation in the field, where unreproducible configurations allow systems to report wildly inconsistent scores across sources.

Key Takeaways

→Lean retrieved context (9.6k tokens) outperforms full-history baselines by 10.4 percentage points on LongMemEval_S benchmark
→Bi-temporal knowledge graph with provenance tracking eliminates the need per-fact LLM calls while handling contradictions
→Hybrid read path combining dense, lexical, graph, and temporal signals proves essential—facts alone lose recall
→Open-source evaluation harness with reproducible commands addresses critical measurement integrity issues in memory benchmarks
→8x reduction in token usage demonstrates practical deployment advantages for cost-sensitive production systems

#llm-memory #knowledge-graphs #long-context #retrieval-augmented #benchmark-integrity #open-source #ai-infrastructure

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge