Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
Researchers introduce Consensus Entropy (CE), a training-free metric that improves OCR quality by measuring agreement across multiple Vision-Language Models, achieving 42.1% F1 score improvements over existing methods. The technique enables self-verifying OCR without supervision, addressing a critical gap in automated error detection for data generation pipelines used in LLM training.
The paper addresses a fundamental limitation in current OCR systems: while average accuracy has improved, state-of-the-art models struggle to detect which individual predictions are unreliable. This creates downstream problems for LLM training pipelines that depend on high-quality OCR-generated data. Consensus Entropy solves this by leveraging a counterintuitive principle—correct outputs cluster together across models while errors diverge, enabling error detection without labeled validation data.
The broader context involves the explosive growth of multimodal AI systems. As companies scale LLM training, they require massive amounts of clean text extracted from images and documents. Manual quality control becomes prohibitively expensive, and existing automated verification methods like VLM-as-Judge prove less effective than ensemble agreement signals. This research reflects an industry trend toward leveraging model disagreement as a reliability signal.
For practitioners and infrastructure developers, CE-OCR offers immediate practical value. The framework requires no retraining, integrates with existing VLMs as a plug-and-play layer, and reduces computational overhead through adaptive routing. A 42.1% improvement in quality verification directly impacts data pipeline costs and final model performance. Organizations processing large document volumes can deploy this immediately to reduce downstream errors in training data.
The research opens questions about optimal ensemble composition and whether CE principles generalize beyond OCR to other structured prediction tasks. The availability of open-source code accelerates adoption, potentially making this a standard component in data preparation workflows.
- →Consensus Entropy measures model agreement entropy to detect OCR errors without training or labeled data
- →CE-OCR improves quality verification F1 scores by 42.1% compared to VLM-as-Judge approaches
- →The framework is model-agnostic and requires no retraining, enabling immediate integration into existing pipelines
- →Ensemble disagreement signals reliability better than single-model confidence scores for OCR verification
- →The technique addresses a critical gap in automated quality control for LLM training data pipelines