AIBullisharXiv – CS AI · 14h ago7/10
🧠
Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents
Researchers introduce Metacognitive Memory Policy Optimization (MMPO), a novel training method that improves how AI language model agents manage memory across long-horizon tasks. The approach uses Belief Entropy—a self-supervised metric measuring uncertainty about task state—to provide fine-grained supervision during memory summarization, enabling agents to maintain 97.1% performance even with 1.75M-token contexts.