#long-horizon-planning News & Analysis

12 articles tagged with #long-horizon-planning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AIBullisharXiv – CS AI · May 117/10

🧠

One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy

Researchers introduce OneWM-VLA, a new approach to vision-language-action models that compresses visual input to a single token per frame while maintaining or improving long-horizon task performance. The method achieves significant improvements on robotics benchmarks including 61.3% success on MetaWorld MT50 and 60% on real-world cloth folding tasks, demonstrating that excessive visual bandwidth in world models may be unnecessary.

AIBullisharXiv – CS AI · May 47/10

🧠

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

Researchers introduce Interleaved Vision-Language Reasoning (IVLR), a new AI framework that combines text and visual planning for robotic manipulation tasks. The system generates explicit reasoning traces alternating between textual subgoals and visual keyframes, achieving 95.5% success on LIBERO benchmarks and demonstrating that multimodal reasoning significantly outperforms text-only or vision-only approaches.

AIBearisharXiv – CS AI · Mar 267/10

🧠

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Researchers introduced EnterpriseArena, the first benchmark testing whether AI agents can function as CFOs by allocating resources in complex enterprise environments over 132 months. Testing on eleven advanced LLMs revealed poor performance, with only 16% of runs surviving the full simulation period, highlighting significant capability gaps in long-term resource allocation under uncertainty.

AIBullisharXiv – CS AI · Mar 267/10

🧠

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Researchers have developed ML-Master 2.0, an autonomous AI agent that achieves breakthrough performance in ultra-long-horizon machine learning tasks by using Hierarchical Cognitive Caching architecture. The system achieved a 56.44% medal rate on OpenAI's MLE-Bench, demonstrating the ability to maintain strategic coherence over experimental cycles spanning days or weeks.

🏢 OpenAI

AINeutralarXiv – CS AI · Jun 236/10

🧠

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Researchers introduced PlanBench-XL, a benchmark testing how LLM agents plan and execute tasks across 1,665 tools in realistic scenarios. The study reveals significant vulnerabilities in current AI systems, with performance dropping from 51.9% to 11.36% accuracy when tools fail or behave unexpectedly, exposing critical gaps in adaptive planning capabilities.

🧠 GPT-5

AIBullisharXiv – CS AI · Jun 236/10

🧠

Beyond the Next Step: Variable-Length Latent World Models for Long-Horizon Planning

Researchers propose Variable-Length Latent World Models (VLWMs), a novel framework that predicts future environment states across variable action sequence lengths rather than single steps, addressing a fundamental limitation in AI planning. The approach achieves 13% performance improvements over existing latent world models on long-horizon control tasks through curriculum training and specialized planning methods.

AINeutralarXiv – CS AI · Jun 96/10

🧠

FF-JEPA: Long-Horizon Planning in World Models with Latent Planners

Researchers propose FF-JEPA, a hierarchical world model architecture that enables long-horizon planning by combining action-conditioned and action-free latent planners, eliminating the need for explicit goal images and addressing computational inefficiencies in previous JEPA-based planning approaches.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Think Like a Pilot: Fine-Grained Long-Horizon UAV Navigation

Researchers introduce FLIGHT, a benchmark for training UAV agents to follow natural language instructions with precise, continuous flight control over long-horizon tasks. The accompanying FLIGHT VLA architecture decouples high-level reasoning from low-frequency control, advancing autonomous drone navigation beyond existing discrete-action systems.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Neuro-Symbolic Learning for Long-Horizon Task Planning Under Complex Logical Constraints

Researchers present a neuro-symbolic learning framework that addresses a critical inefficiency in robotic task planning by combining neural networks with symbolic planning under complex logical constraints. The method uses bilevel optimization to learn object-importance scores while solving planning problems in pruned search spaces, reducing planning failures by 80% and planning time by 57% across multiple benchmarks and real-world robotic applications.

AIBullisharXiv – CS AI · May 116/10

🧠

Scalable Option Learning in High-Throughput Environments

Facebook Research introduces Scalable Option Learning (SOL), a hierarchical reinforcement learning algorithm that achieves 35x higher throughput than existing methods. The system was validated on complex environments including NetHack using 30 billion frames of experience, demonstrating superior performance over flat agents and suggesting that hierarchical RL can finally benefit from large-scale training.

$SOL

AINeutralarXiv – CS AI · May 16/10

🧠

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

Researchers evaluated eight LLM agents across three interaction paradigms—domain-specific agents, computer-use agents, and general-purpose coding agents—on scientific visualization tasks. The study reveals fundamental tradeoffs: general-purpose agents excel at task completion but consume more computational resources, while domain-specific agents offer efficiency and stability at the cost of flexibility, with persistent memory improving performance across modalities.

AIBullisharXiv – CS AI · Mar 37/109

🧠

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

Researchers introduce HiMAC, a hierarchical reinforcement learning framework that improves LLM agent performance on long-horizon tasks by separating macro-level planning from micro-level execution. The approach demonstrates state-of-the-art results across multiple environments, showing that structured hierarchy is more effective than simply scaling model size for complex agent tasks.