AIBullisharXiv – CS AI · Mar 46/104
🧠A large-scale benchmarking study finds that powerful Multimodal Large Language Models (MLLMs) can extract information from business documents using image-only input, potentially eliminating the need for traditional OCR preprocessing. The research demonstrates that well-designed prompts and instructions can further enhance MLLM performance in document processing tasks.
AIBullishHugging Face Blog · May 186/10
🧠PaddleOCR 3.5 introduces a Transformers backend for optical character recognition and document parsing tasks, enabling developers to leverage modern deep learning architectures for improved accuracy and flexibility in text extraction workflows.
AINeutralarXiv – CS AI · Apr 106/10
🧠Researchers introduce Guardian Parser Pack, an AI-driven system that extracts and normalizes missing-person intelligence from heterogeneous documents using LLM-assisted parsing combined with schema validation. The system achieved 86.64% F1 score on manual evaluation while improving data completeness to 96.97%, demonstrating practical viability of probabilistic AI in high-stakes investigative workflows.
AIBullishMarkTechPost · Mar 156/10
🧠Zhipu AI has released GLM-OCR, a compact 0.9B parameter multimodal model designed to solve real-world document parsing challenges including OCR, table extraction, formula recognition, and key information extraction. The model aims to address the engineering difficulties of processing actual documents rather than clean demo images while maintaining resource efficiency.
AIBullisharXiv – CS AI · Mar 36/107
🧠NovaLAD is a new CPU-optimized document extraction pipeline that uses dual YOLO models for converting unstructured documents into structured formats for AI applications. The system achieves 96.49% TEDS and 98.51% NID on benchmarks, outperforming existing commercial and open-source parsers while running efficiently on CPU without requiring GPU resources.
AIBullisharXiv – CS AI · Feb 276/105
🧠Researchers introduce MoDora, an AI-powered system that uses tree-based analysis to understand and answer questions about semi-structured documents containing mixed data elements like tables, charts, and text. The system addresses challenges in processing fragmented OCR data and hierarchical document structures, achieving 5.97%-61.07% accuracy improvements over existing baselines.
AIBullisharXiv – CS AI · Mar 115/10
🧠The DIMT 2025 Challenge advances research in Document Image Machine Translation, featuring OCR-free and OCR-based tracks for translating text in complex document layouts. The competition attracted 69 teams with 27 valid submissions, demonstrating that large-model approaches show promise for handling complex document translation tasks.
AINeutralHugging Face Blog · Apr 224/103
🧠The article discusses the finetuning process of olmOCR, an optical character recognition engine, to improve its accuracy and reliability. This represents an advancement in AI-powered text recognition technology that could have applications across various digital platforms.
AINeutralHugging Face Blog · Oct 23/104
🧠The article appears to discuss SOTA (State of the Art) OCR technology implementation using Core ML and dots.ocr framework. However, the article body is empty, preventing detailed analysis of the technical implementation or market implications.
AINeutralHugging Face Blog · Oct 211/107
🧠The article title suggests content about improving Optical Character Recognition (OCR) pipelines using open-source models, but the article body appears to be empty or not provided.