AINeutralarXiv – CS AI · 14h ago6/10
🧠
MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing
Researchers introduce MPDocBench-Parse, a new benchmark dataset for evaluating multi-page document parsing systems across realistic, complex scenarios. The benchmark comprises 433 manually annotated documents spanning 3,246 pages in 15 document types, revealing that existing AI models excel at basic text extraction but struggle with semantic continuity, visual content preservation, and hierarchical structure recovery.