AINeutralarXiv – CS AI · 15h ago6/10
🧠
Doc-CoB: Enhancing Document Understanding with Visual Chain-of-Boxes Reasoning
Researchers introduce Doc-CoB, a new framework that improves how AI models understand documents by progressively focusing on relevant layout regions while maintaining global context. The approach combines coarse-to-fine visual reasoning with multimodal large language models and demonstrates significant performance improvements across seven benchmarks.