InPhyRe Discovers: Large Multimodal Models Struggle in Inductive Physical Reasoning
Researchers introduced InPhyRe, a new benchmark showing that large multimodal models (LMMs) struggle with inductive physical reasoning—their ability to apply learned physical laws to novel, unseen scenarios. Testing 13 LMMs revealed critical weaknesses: models fail to generalize parametric knowledge, perform poorly with unseen physical laws, and exhibit language bias that causes them to ignore visual inputs, raising concerns about their reliability for safety-critical applications.