#long-horizon-agents News & Analysis

4 articles tagged with #long-horizon-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents

Researchers discover that LLM agents lose safety compliance when governance constraints are compressed or summarized during long sessions, with violations rising from 0% to 59% after context compaction. The study introduces a benchmark demonstrating this 'Governance Decay' failure mode and proposes Constraint Pinning as a training-free mitigation.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

Researchers introduce SegTreeMem, a novel memory architecture for long-horizon conversational AI agents that organizes conversation history using temporally-ordered segment trees instead of purely semantic similarity. The system demonstrates improved performance across multiple benchmarks by preserving chronological order while enabling hierarchical retrieval, with ablation studies confirming that temporal sequencing is critical to the approach's effectiveness.

AINeutralarXiv – CS AI · May 296/10

🧠

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

Researchers introduce S3MEM, a structured memory framework that improves how AI agents retrieve and answer questions about long trajectory histories. The system outperforms standard retrieval-augmented generation by organizing trajectories into scene-event units and using anchor-sensitive retrieval, achieving better accuracy with fewer tokens across multiple interactive environments.

AIBullisharXiv – CS AI · Mar 37/108

🧠

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

Researchers propose MemPO (Self-Memory Policy Optimization), a new algorithm that enables AI agents to autonomously manage their memory during long-horizon tasks. The method achieves significant performance improvements with 25.98% F1 score gains over base models while reducing token usage by 67.58%.