AIBullisharXiv – CS AI · 18h ago6/10
🧠
Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?
Researchers propose Robust-U1, a framework enabling Multimodal Large Language Models (MLLMs) to self-recover corrupted visual content through supervised fine-tuning and reinforcement learning. The approach demonstrates state-of-the-art robustness on real-world corruption benchmarks, suggesting that visual self-recovery is a critical mechanism for improving MLLM performance under adversarial conditions.