y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#long-horizon-planning News & Analysis

7 articles tagged with #long-horizon-planning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AIBullisharXiv – CS AI · May 117/10
🧠

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

Researchers introduce OneWM-VLA, a new approach to vision-language-action models that compresses visual input to a single token per frame while maintaining or improving long-horizon task performance. The method achieves significant improvements on robotics benchmarks including 61.3% success on MetaWorld MT50 and 60% on real-world cloth folding tasks, demonstrating that excessive visual bandwidth in world models may be unnecessary.

AIBullisharXiv – CS AI · May 47/10
🧠

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

Researchers introduce Interleaved Vision-Language Reasoning (IVLR), a new AI framework that combines text and visual planning for robotic manipulation tasks. The system generates explicit reasoning traces alternating between textual subgoals and visual keyframes, achieving 95.5% success on LIBERO benchmarks and demonstrating that multimodal reasoning significantly outperforms text-only or vision-only approaches.

AIBearisharXiv – CS AI · Mar 267/10
🧠

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Researchers introduced EnterpriseArena, the first benchmark testing whether AI agents can function as CFOs by allocating resources in complex enterprise environments over 132 months. Testing on eleven advanced LLMs revealed poor performance, with only 16% of runs surviving the full simulation period, highlighting significant capability gaps in long-term resource allocation under uncertainty.

AIBullisharXiv – CS AI · Mar 267/10
🧠

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Researchers have developed ML-Master 2.0, an autonomous AI agent that achieves breakthrough performance in ultra-long-horizon machine learning tasks by using Hierarchical Cognitive Caching architecture. The system achieved a 56.44% medal rate on OpenAI's MLE-Bench, demonstrating the ability to maintain strategic coherence over experimental cycles spanning days or weeks.

🏢 OpenAI
AIBullisharXiv – CS AI · May 116/10
🧠

Scalable Option Learning in High-Throughput Environments

Facebook Research introduces Scalable Option Learning (SOL), a hierarchical reinforcement learning algorithm that achieves 35x higher throughput than existing methods. The system was validated on complex environments including NetHack using 30 billion frames of experience, demonstrating superior performance over flat agents and suggesting that hierarchical RL can finally benefit from large-scale training.

$SOL
AINeutralarXiv – CS AI · May 16/10
🧠

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

Researchers evaluated eight LLM agents across three interaction paradigms—domain-specific agents, computer-use agents, and general-purpose coding agents—on scientific visualization tasks. The study reveals fundamental tradeoffs: general-purpose agents excel at task completion but consume more computational resources, while domain-specific agents offer efficiency and stability at the cost of flexibility, with persistent memory improving performance across modalities.

AIBullisharXiv – CS AI · Mar 37/109
🧠

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

Researchers introduce HiMAC, a hierarchical reinforcement learning framework that improves LLM agent performance on long-horizon tasks by separating macro-level planning from micro-level execution. The approach demonstrates state-of-the-art results across multiple environments, showing that structured hierarchy is more effective than simply scaling model size for complex agent tasks.