y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#memory-management News & Analysis

30 articles tagged with #memory-management. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

30 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

VikingMem: A Memory Base Management System for Stateful LLM-based Applications

Researchers introduce VikingMem, a memory management system for long-term LLM interactions that addresses context window limitations through selective memory extraction, stateful evolution, and temporal weighting. The system demonstrates 30% improvements in memory retrieval effectiveness while maintaining low latency, offering a generalizable solution across diverse applications beyond traditional chatbots.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading

Researchers introduce Deep Optimizer States, a technique that reduces GPU memory constraints during large language model training by dynamically offloading optimizer state between host and GPU memory during computation cycles. The method achieves 2.5× faster iterations compared to existing approaches by better managing the memory fluctuations inherent in transformer training pipelines.

AIBullisharXiv – CS AI · Apr 77/10
🧠

LightThinker++: From Reasoning Compression to Memory Management

Researchers developed LightThinker++, a new framework that enables large language models to compress intermediate reasoning thoughts and manage memory more efficiently. The system reduces peak token usage by up to 70% while improving accuracy by 2.42% and maintaining performance over extended reasoning tasks.

AIBullisharXiv – CS AI · Mar 267/10
🧠

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

Researchers developed ODMA, a new memory allocation strategy that improves Large Language Model serving performance on memory-constrained accelerators by up to 27%. The technique addresses bandwidth limitations in LPDDR systems through adaptive bucket partitioning and dynamic generation-length prediction.

AIBullisharXiv – CS AI · Mar 177/10
🧠

StatePlane: A Cognitive State Plane for Long-Horizon AI Systems Under Bounded Context

Researchers introduce StatePlane, a model-agnostic cognitive state management system that enables AI systems to maintain coherent reasoning over long interaction horizons without expanding context windows or retraining models. The system uses episodic, semantic, and procedural memory mechanisms inspired by cognitive psychology to overcome current limitations in large language models.

AIBullisharXiv – CS AI · Mar 177/10
🧠

Justitia: Fair and Efficient Scheduling of Task-parallel LLM Agents with Selective Pampering

Justitia is a new scheduling system for task-parallel LLM agents that optimizes GPU server performance through selective resource allocation based on completion order prediction. The system uses memory-centric cost quantification and virtual-time fair queuing to achieve both efficiency and fairness in LLM serving environments.

🏢 Meta
AIBullisharXiv – CS AI · Mar 117/10
🧠

The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

Researchers developed Pichay, a demand paging system that treats LLM context windows like computer memory with hierarchical caching. The system reduces context consumption by up to 93% in production by evicting stale content and managing memory more efficiently, addressing fundamental scalability issues in AI systems.

AIBullisharXiv – CS AI · Mar 67/10
🧠

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Researchers introduce AMV-L, a new memory management framework for long-running LLM systems that uses utility-based lifecycle management instead of traditional time-based retention. The system improves throughput by 3.1x and reduces latency by up to 4.7x while maintaining retrieval quality by controlling memory working-set size rather than just retention time.

AIBullisharXiv – CS AI · Mar 47/102
🧠

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Researchers introduce Neural Paging, a new architecture that addresses the computational bottleneck of finite context windows in Large Language Models by implementing a hierarchical system that decouples reasoning from memory management. The approach reduces computational complexity from O(N²) to O(N·K²) for long-horizon reasoning tasks, potentially enabling more efficient AI agents.

AIBullisharXiv – CS AI · Mar 37/103
🧠

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

Researchers introduce FreeKV, a training-free optimization framework that dramatically improves KV cache retrieval efficiency for large language models with long context windows. The system achieves up to 13x speedup compared to existing methods while maintaining near-lossless accuracy through speculative retrieval and hybrid memory layouts.

$NEAR
AIBullisharXiv – CS AI · Feb 277/107
🧠

Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents

Researchers introduce Contextual Memory Virtualisation (CMV), a system that preserves LLM understanding across extended sessions by treating context as version-controlled state using DAG-based management. The system includes a trimming algorithm that reduces token counts by 20-86% while preserving all user interactions, demonstrating particular efficiency in tool-use sessions.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Context Distillation as Latent Memory Management

Researchers propose a novel approach to context distillation that treats compressed contextual information as a latent memory management problem, using modular LoRA adapters with intelligent retrieval and self-gating mechanisms to improve efficiency and robustness in machine learning systems.

AIBullisharXiv – CS AI · May 126/10
🧠

KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

Researchers present KV-RM, a runtime optimization that manages KV-cache memory movement in static-graph LLM decoders, achieving better throughput and reduced latency variability without sacrificing the predictability benefits of static graph execution. The approach decouples logical KV histories from physical storage through a block pager and merge-staged transport mechanism, demonstrating practical improvements on multi-GPU systems.

🏢 Nvidia
AINeutralarXiv – CS AI · May 116/10
🧠

MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory

Researchers introduce MemoRepair, a system that addresses cascade failures in agentic memory by preventing stale or invalidated information from corrupting downstream AI agent decisions. Using a barrier-first approach and graph-based optimization, the system reduces invalid memory exposure from 69-94% to 0% while maintaining 91-94% of valid successor states with significantly lower repair costs.

AIBullisharXiv – CS AI · May 116/10
🧠

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding

SAVEMem is a training-free framework that improves real-time video understanding by incorporating semantic awareness into memory management rather than relying solely on visual similarity. The system achieves significant performance gains on streaming video benchmarks while reducing GPU memory consumption by 48%, demonstrating practical advances in efficient AI model inference.

AINeutralarXiv – CS AI · May 46/10
🧠

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

MemRouter is a new memory management system for conversational AI agents that uses lightweight embedding-based routing instead of expensive LLM generation to decide which conversation turns to store. The approach achieves 52.0 F1 score versus 45.6 for LLM-based alternatives while reducing latency from 970ms to 58ms, suggesting memory admission can be effectively learned through supervised classification rather than generative models.

AINeutralarXiv – CS AI · May 16/10
🧠

TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents

Researchers introduce TiMem, a temporal-hierarchical memory framework that helps conversational AI agents manage long conversation histories beyond LLM context limits. The system organizes interactions through a Temporal Memory Tree, achieving state-of-the-art performance on memory recall benchmarks while reducing memory overhead by over 50%.

AINeutralMicrosoft Research Blog · Mar 106/10
🧠

From raw interaction to reusable knowledge: Rethinking memory for AI agents

Microsoft Research highlights a counterintuitive problem where giving AI agents more memory actually reduces their effectiveness. As interaction logs accumulate, they become large, filled with irrelevant content, and difficult to search through, making it harder for agents to find relevant information for current tasks.

AIBullisharXiv – CS AI · Mar 66/10
🧠

Adaptive Memory Admission Control for LLM Agents

Researchers propose Adaptive Memory Admission Control (A-MAC), a new framework for managing long-term memory in LLM-based agents. The system improves memory precision-recall by 31% while reducing latency through structured decision-making based on five interpretable factors rather than opaque LLM-driven policies.

AIBullisharXiv – CS AI · Mar 37/107
🧠

Semantic XPath: Structured Agentic Memory Access for Conversational AI

Researchers have developed Semantic XPath, a tree-structured memory system for conversational AI that improves performance by 176.7% over traditional methods while using only 9.1% of the tokens. The system addresses scalability issues in long-term AI conversations by efficiently accessing and updating structured memory instead of appending growing conversation history.

AIBullisharXiv – CS AI · Mar 36/107
🧠

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AINeutralarXiv – CS AI · Mar 36/104
🧠

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Researchers introduce AMemGym, an interactive benchmarking environment for evaluating and optimizing memory management in long-horizon conversations with AI assistants. The framework addresses limitations in current memory evaluation methods by enabling on-policy testing with LLM-simulated users and revealing performance gaps in existing memory systems like RAG and long-context LLMs.

AIBullisharXiv – CS AI · Mar 36/104
🧠

OrbitFlow: SLO-Aware Long-Context LLM Serving with Fine-Grained KV Cache Reconfiguration

OrbitFlow is a new KV cache management system for long-context LLM serving that uses adaptive memory allocation and fine-grained optimization to improve performance. The system achieves up to 66% better SLO attainment and 3.3x higher throughput by dynamically managing GPU memory usage during token generation.

Page 1 of 2Next →