PRISM: Perception Reasoning Interleaved for Sequential Decision Making
PRISM is a new AI framework that improves embodied agents by coupling Vision-Language Models with Large Language Models through dynamic question-answer interactions, addressing the perception-reasoning gap in multimodal AI systems. The framework demonstrates significant performance improvements on benchmark tasks like ALFWorld and R2R, showing that interactive, goal-oriented perception yields superior understanding compared to standalone visual analysis.