AINeutralarXiv – CS AI · 9h ago6/10
🧠
Benchmarking Open-Source Layout Detection Models for Data Snapshot Extraction from Institutional Documents
Researchers have developed a benchmark dataset and evaluation framework for extracting data snapshots (figures and tables) from institutional documents like World Bank reports. The study reveals that current open-source layout detection models fail to generalize effectively to operational documents, struggling to distinguish analytical from non-analytical content and often fragmenting composite visual artifacts.
🏢 Hugging Face