9 articles tagged with #ocr. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 46/104
๐ง A large-scale benchmarking study finds that powerful Multimodal Large Language Models (MLLMs) can extract information from business documents using image-only input, potentially eliminating the need for traditional OCR preprocessing. The research demonstrates that well-designed prompts and instructions can further enhance MLLM performance in document processing tasks.
AINeutralarXiv โ CS AI ยท Apr 106/10
๐ง Researchers introduce Guardian Parser Pack, an AI-driven system that extracts and normalizes missing-person intelligence from heterogeneous documents using LLM-assisted parsing combined with schema validation. The system achieved 86.64% F1 score on manual evaluation while improving data completeness to 96.97%, demonstrating practical viability of probabilistic AI in high-stakes investigative workflows.
AIBullishMarkTechPost ยท Mar 156/10
๐ง Zhipu AI has released GLM-OCR, a compact 0.9B parameter multimodal model designed to solve real-world document parsing challenges including OCR, table extraction, formula recognition, and key information extraction. The model aims to address the engineering difficulties of processing actual documents rather than clean demo images while maintaining resource efficiency.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง NovaLAD is a new CPU-optimized document extraction pipeline that uses dual YOLO models for converting unstructured documents into structured formats for AI applications. The system achieves 96.49% TEDS and 98.51% NID on benchmarks, outperforming existing commercial and open-source parsers while running efficiently on CPU without requiring GPU resources.
AIBullisharXiv โ CS AI ยท Feb 276/105
๐ง Researchers introduce MoDora, an AI-powered system that uses tree-based analysis to understand and answer questions about semi-structured documents containing mixed data elements like tables, charts, and text. The system addresses challenges in processing fragmented OCR data and hierarchical document structures, achieving 5.97%-61.07% accuracy improvements over existing baselines.
AIBullisharXiv โ CS AI ยท Mar 115/10
๐ง The DIMT 2025 Challenge advances research in Document Image Machine Translation, featuring OCR-free and OCR-based tracks for translating text in complex document layouts. The competition attracted 69 teams with 27 valid submissions, demonstrating that large-model approaches show promise for handling complex document translation tasks.
AINeutralHugging Face Blog ยท Apr 224/103
๐ง The article discusses the finetuning process of olmOCR, an optical character recognition engine, to improve its accuracy and reliability. This represents an advancement in AI-powered text recognition technology that could have applications across various digital platforms.
AINeutralHugging Face Blog ยท Oct 23/104
๐ง The article appears to discuss SOTA (State of the Art) OCR technology implementation using Core ML and dots.ocr framework. However, the article body is empty, preventing detailed analysis of the technical implementation or market implications.
AINeutralHugging Face Blog ยท Oct 211/107
๐ง The article title suggests content about improving Optical Character Recognition (OCR) pipelines using open-source models, but the article body appears to be empty or not provided.