#memory-optimization News & Analysis

47 articles tagged with #memory-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

47 articles

AINeutralarXiv – CS AI · Jun 106/10

🧠

Learning What to Remember: Observability-Safe Memory Retention via Constrained Optimization for Long-Horizon Language Agents

Researchers introduce OSL-MR, a framework that optimizes memory retention for long-horizon language agents by treating it as a constrained optimization problem rather than local decisions. The approach combines learned evidence valuation with heuristic scoring while respecting real-world observability constraints, demonstrating superior performance over existing methods on benchmark datasets.

AINeutralarXiv – CS AI · Jun 106/10

🧠

GRID: Scaling Task-Agnostic Inference in Continual Prompt Tuning

Researchers introduce GRID, a framework addressing scalability and task-agnostic inference challenges in continual prompt tuning for large language models. The method combines output-aware decoding with gradient-guided prompt selection to improve backward transfer while reducing memory consumption across multiple LLM architectures.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

Researchers have adapted GPU parallelism techniques to neural network verification, enabling formal safety proofs on larger models. Fully Sharded Data Parallelism (FSDP) reduces memory usage by 80-90% while maintaining identical verification results, though Tensor Parallelism trades some bound quality for memory efficiency.

$COMP

AIBullisharXiv – CS AI · Jun 96/10

🧠

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

Researchers introduce DySink, a novel framework for autoregressive long video generation that dynamically selects relevant historical frames instead of using static early-frame anchors. The method addresses the problem of outdated context degrading video quality and introduces a sink anomaly gate to prevent content collapse, demonstrating improvements in temporal consistency for minute-long videos.

AIBullisharXiv – CS AI · Jun 56/10

🧠

Enhancing Software Engineering Through Closed-Loop Memory Optimization

Researchers introduce MemOp, a closed-loop memory optimization framework that enables AI software engineering agents to retain and reuse experiences across tasks. The system achieves up to 5.25% improvement in success rates and reduces computational costs by 9.79% while establishing a principled method for evaluating memory utility in autonomous agents.

AIBullisharXiv – CS AI · Jun 26/10

🧠

STaR-KV: Spatio-Temporal Adaptive Re-weighting for KV Cache Compression in GUI Vision-Language Models

Researchers introduce STaR-KV, a training-free compression framework that reduces key-value cache memory consumption in vision-language GUI agents by up to 40% while maintaining accuracy. The method addresses a critical bottleneck where models like UI-TARS-1.5-7B consume prohibitive GPU memory during multi-step interactions, enabling more practical deployment on standard accelerators.

AIBullisharXiv – CS AI · Jun 16/10

🧠

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

Researchers propose Boundary-Guided Policy Optimization (BGPO), a memory-efficient reinforcement learning algorithm for diffusion large language models that addresses a critical bottleneck in likelihood function approximation. By constructing a specially designed lower bound that enables gradient accumulation across samples while maintaining mathematical equivalence to traditional objectives, BGPO achieves superior performance on math, coding, and planning tasks with significantly reduced memory overhead.

AIBullisharXiv – CS AI · May 296/10

🧠

Enhancing Reinforcement Learning in 3D Environments through Semantic Segmentation: A Case Study in ViZDoom

Researchers propose semantic segmentation-based input representations to address memory and learning challenges in reinforcement learning for 3D environments, demonstrating 66-98% memory reduction in ViZDoom experiments while improving agent performance through enhanced visual information processing.

AINeutralarXiv – CS AI · May 286/10

🧠

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

Researchers introduce BudgetMem, a runtime memory framework for LLM agents that uses query-aware routing to dynamically allocate computational resources across memory modules at three cost tiers. The system employs reinforcement learning to optimize the performance-cost trade-off, demonstrating improvements over static memory approaches across multiple benchmark datasets.

AIBullisharXiv – CS AI · May 126/10

🧠

CERSA: Cumulative Energy-Retaining Subspace Adaptation for Memory-Efficient Fine-Tuning

Researchers introduce CERSA, a novel parameter-efficient fine-tuning method that uses singular value decomposition to reduce memory consumption while fine-tuning large language models. The technique outperforms existing methods like LoRA by capturing more rank characteristics of weight modifications while requiring substantially less memory for frozen weights.

AINeutralarXiv – CS AI · May 116/10

🧠

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment

Researchers propose Shadow Mask Distillation to address the memory bottleneck created by KV cache compression during reinforcement learning post-training of large language models. The technique tackles the critical off-policy bias that emerges when compressed contexts are used during rollout generation while full contexts are used for parameter updates, a problem that amplifies instability in RL optimization.

AIBullisharXiv – CS AI · May 116/10

🧠

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Researchers introduce MemSearcher, an AI agent framework that optimizes how large language models handle multi-turn interactions by maintaining compact memory instead of concatenating full conversation history. The approach uses a novel multi-context GRPO training method and demonstrates superior performance while maintaining stable token counts, reducing computational overhead.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

Aethon is a new systems primitive that enables stateful AI agents to be instantiated in near-constant time by using reference-based replication instead of full materialization. This architectural innovation addresses latency and memory overhead constraints in existing AI runtime systems, making it possible to spawn, specialize, and govern agents at production scale.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Artifacts as Memory Beyond the Agent Boundary

Researchers formalize how agents can use environmental artifacts as external memory to reduce computational requirements in reinforcement learning tasks. The study demonstrates that spatial observations can implicitly serve as memory substitutes, allowing agents to learn effective policies with less internal memory capacity than previously thought necessary.

AIBullisharXiv – CS AI · Apr 76/10

🧠

REAM: Merging Improves Pruning of Experts in LLMs

Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Self-Indexing KVCache: Predicting Sparse Attention from Compressed Keys

Researchers propose a novel self-indexing KV cache system that unifies compression and retrieval for efficient sparse attention in large language models. The method uses 1-bit vector quantization and integrates with FlashAttention to reduce memory bottlenecks in long-context LLM inference.

AIBullisharXiv – CS AI · Mar 126/10

🧠

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

Researchers have developed LookaheadKV, a new framework that significantly improves memory efficiency in large language models by intelligently evicting less important cached data. The method achieves superior accuracy while reducing computational costs by up to 14.5x compared to existing approaches, making long-context AI tasks more practical.

AIBullisharXiv – CS AI · Mar 37/108

🧠

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Researchers propose MemPO (Self-Memory Policy Optimization), a new algorithm that enables AI agents to autonomously manage their memory during long-horizon tasks. The method achieves significant performance improvements with 25.98% F1 score gains over base models while reducing token usage by 67.58%.

AIBullisharXiv – CS AI · Mar 36/103

🧠

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding

FluxMem is a new training-free framework for streaming video understanding that uses hierarchical memory compression to reduce computational costs. The system achieves state-of-the-art performance on video benchmarks while reducing latency by 69.9% and GPU memory usage by 34.5%.

AIBullisharXiv – CS AI · Mar 27/1012

🧠

MEGS$^{2}$: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

Researchers introduce MEGS², a new memory-efficient framework for 3D Gaussian Splatting that reduces memory consumption by 50% for static rendering and 40% for real-time rendering. The breakthrough enables 3D rendering on edge devices by replacing memory-intensive spherical harmonics with lightweight spherical Gaussian lobes and implementing unified pruning optimization.

AIBullishHugging Face Blog · Jul 306/105

🧠

Memory-efficient Diffusion Transformers with Quanto and Diffusers

The article discusses memory-efficient implementation of Diffusion Transformers using Quanto quantization library integrated with Diffusers. This technical advancement enables running large-scale AI image generation models with reduced memory requirements, making them more accessible for deployment.

AINeutralarXiv – CS AI · Feb 274/108

🧠

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Researchers introduce GRAVE2, GRAVER and GRAVER2 algorithms that extend Generalized Rapid Action Value Estimation (GRAVE) for game playing AI. These new variants dramatically reduce memory requirements while maintaining the same playing strength as the original GRAVE algorithm.

← PrevPage 2 of 2