🧠 AI⚪ NeutralImportance 6/10

FF-JEPA: Long-Horizon Planning in World Models with Latent Planners

arXiv – CS AI|Sergi Masip, Jonathan Swinnen, Yutong Hu, Renaud Detry, Tinne Tuytelaars|June 9, 2026 at 04:00 AM

🤖AI Summary

Researchers propose FF-JEPA, a hierarchical world model architecture that enables long-horizon planning by combining action-conditioned and action-free latent planners, eliminating the need for explicit goal images and addressing computational inefficiencies in previous JEPA-based planning approaches.

Analysis

FF-JEPA addresses a fundamental challenge in embodied AI systems: how to plan effectively over extended time horizons without prohibitive computational costs. Traditional JEPA approaches rely on Cross-Entropy Method optimization over action trajectories, which becomes intractable as planning horizons extend and requires precise goal state images that may be unavailable in real-world scenarios. The proposed hierarchical architecture decouples the planning problem by introducing an action-free latent planner that predicts intermediate subgoals, transforming long-horizon planning into a sequence of shorter optimization problems.

This work builds on growing recognition within the world modeling community that flat architectures struggle with temporal abstraction. Recent advances in hierarchical reinforcement learning and goal-conditioned policies suggest that decomposing complex tasks improves both computational efficiency and generalization. FF-JEPA's elimination of explicit goal image requirements represents a practical advancement toward more flexible, self-supervised planning systems.

For the broader AI development ecosystem, this research demonstrates progress toward autonomous systems capable of open-ended task planning without dense goal specifications. The preliminary validation on PushT benchmarks shows the approach prevents the performance collapse typical of flat models on extended tasks. This matters for robotics applications where real-world tasks inherently span multiple phases with implicit intermediate objectives.

Future validation should assess scalability to higher-dimensional action spaces and more complex visual environments. The interplay between subgoal quality and final task performance remains underexplored, as does generalization across diverse task distributions. Whether this hierarchical approach maintains efficiency advantages when handling dynamic environments with stochastic outcomes will determine its practical applicability.

Key Takeaways

→FF-JEPA uses dual forward models with action-free subgoal prediction to enable long-horizon planning without explicit goal images
→Hierarchical decomposition transforms intractable long-horizon optimization into tractable short-term planning problems
→Preliminary results show FF-JEPA overcomes performance collapse that flat world models experience on extended planning tasks
→The approach reduces computational requirements compared to trajectory optimization methods like Cross-Entropy Method
→Architecture combines action-conditioned dynamics with action-free latent planning for improved temporal abstraction