PRISM is a new framework for world model-based planning that uses a lightweight neural network to extract action priors from the same dataset and model representations, improving robotic control performance by 32-35 percentage points without additional architectural complexity. The method integrates state-conditioned confidence into sampling distributions through a closed-form probabilistic update, enabling more effective candidate action generation.
PRISM addresses a fundamental inefficiency in model-based reinforcement learning: while world models have become increasingly accurate at predicting future states, the process of selecting which actions to evaluate during planning remains poorly optimized. Traditional approaches either search arbitrarily through action space or casually borrow action priors from expert demonstrations without leveraging the confidence information those demonstrations encode. This gap between predictive accuracy and planning effectiveness has motivated researchers to bolt on additional components—independent visual encoders or large vision-language models—creating architectural bloat that increases computational overhead and parameter counts.
The research builds on JEPA-style latent world models, which have gained traction as sample-efficient alternatives to pixel-level prediction. PRISM's key innovation lies in its parsimony: rather than introducing new data sources or architectural components, it extracts action priors directly from the world model's frozen encoder using only a lightweight MLP. The framework then implements a mathematically elegant integration mechanism—a precision-weighted Product-of-Gaussians update—that combines the learned prior with the planner's sampling distribution in a parameter-free, closed-form manner.
For the robotics and embodied AI communities, these improvements represent meaningful progress in real-world deployment scenarios. Cube manipulation and pushing tasks are standard benchmarks where 32-35 percentage point gains in success rates translate to substantially more reliable systems. The lack of significant inference overhead makes PRISM practical for real-time applications. This work reflects a broader trend toward extracting maximum value from existing learned representations rather than accumulating model components. The approach demonstrates that proper probabilistic integration of uncertainty—not architectural expansion—drives planning improvements in learned world models.
- →PRISM improves world model-based planning success rates by 32-35 percentage points using only a lightweight MLP attached to frozen encoders.
- →The framework extracts action priors and confidence information from the same dataset and model representations, eliminating need for separate vision-language models.
- →A precision-weighted Product-of-Gaussians update provides parameter-free, closed-form integration of the action prior into the planning distribution.
- →The method maintains architectural simplicity by building directly on standard JEPA-style latent world models without adding computational overhead.
- →This approach represents a shift toward maximizing learned representation value rather than adding new architectural components to robotic control systems.