y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#memory-optimization News & Analysis

36 articles tagged with #memory-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

36 articles
AIBullisharXiv – CS AI · May 116/10
🧠

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Researchers introduce MemSearcher, an AI agent framework that optimizes how large language models handle multi-turn interactions by maintaining compact memory instead of concatenating full conversation history. The approach uses a novel multi-context GRPO training method and demonstrates superior performance while maintaining stable token counts, reducing computational overhead.

AIBullisharXiv – CS AI · Apr 156/10
🧠

Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

Aethon is a new systems primitive that enables stateful AI agents to be instantiated in near-constant time by using reference-based replication instead of full materialization. This architectural innovation addresses latency and memory overhead constraints in existing AI runtime systems, making it possible to spawn, specialize, and govern agents at production scale.

AINeutralarXiv – CS AI · Apr 136/10
🧠

Artifacts as Memory Beyond the Agent Boundary

Researchers formalize how agents can use environmental artifacts as external memory to reduce computational requirements in reinforcement learning tasks. The study demonstrates that spatial observations can implicitly serve as memory substitutes, allowing agents to learn effective policies with less internal memory capacity than previously thought necessary.

AIBullisharXiv – CS AI · Apr 76/10
🧠

REAM: Merging Improves Pruning of Experts in LLMs

Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.

AIBullisharXiv – CS AI · Mar 176/10
🧠

Self-Indexing KVCache: Predicting Sparse Attention from Compressed Keys

Researchers propose a novel self-indexing KV cache system that unifies compression and retrieval for efficient sparse attention in large language models. The method uses 1-bit vector quantization and integrates with FlashAttention to reduce memory bottlenecks in long-context LLM inference.

AIBullisharXiv – CS AI · Mar 126/10
🧠

LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation

Researchers have developed LookaheadKV, a new framework that significantly improves memory efficiency in large language models by intelligently evicting less important cached data. The method achieves superior accuracy while reducing computational costs by up to 14.5x compared to existing approaches, making long-context AI tasks more practical.

AIBullisharXiv – CS AI · Mar 37/108
🧠

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Researchers propose MemPO (Self-Memory Policy Optimization), a new algorithm that enables AI agents to autonomously manage their memory during long-horizon tasks. The method achieves significant performance improvements with 25.98% F1 score gains over base models while reducing token usage by 67.58%.

AIBullisharXiv – CS AI · Mar 36/103
🧠

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding

FluxMem is a new training-free framework for streaming video understanding that uses hierarchical memory compression to reduce computational costs. The system achieves state-of-the-art performance on video benchmarks while reducing latency by 69.9% and GPU memory usage by 34.5%.

AIBullisharXiv – CS AI · Mar 27/1012
🧠

MEGS$^{2}$: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

Researchers introduce MEGS², a new memory-efficient framework for 3D Gaussian Splatting that reduces memory consumption by 50% for static rendering and 40% for real-time rendering. The breakthrough enables 3D rendering on edge devices by replacing memory-intensive spherical harmonics with lightweight spherical Gaussian lobes and implementing unified pruning optimization.

AIBullishHugging Face Blog · Jul 306/105
🧠

Memory-efficient Diffusion Transformers with Quanto and Diffusers

The article discusses memory-efficient implementation of Diffusion Transformers using Quanto quantization library integrated with Diffusers. This technical advancement enables running large-scale AI image generation models with reduced memory requirements, making them more accessible for deployment.

AINeutralarXiv – CS AI · Feb 274/108
🧠

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Researchers introduce GRAVE2, GRAVER and GRAVER2 algorithms that extend Generalized Rapid Action Value Estimation (GRAVE) for game playing AI. These new variants dramatically reduce memory requirements while maintaining the same playing strength as the original GRAVE algorithm.

← PrevPage 2 of 2