Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines
Researchers introduce MMIOC-1M, a large-scale industrial defect detection benchmark with over one million samples across 351 defect categories, alongside RTVPNet, a novel approach using text-visual prompts to improve industrial defect detection. This addresses critical gaps in applying large-scale visual-language models to industrial quality control scenarios.
Industrial defect detection represents a critical application domain for AI systems, where accuracy directly impacts manufacturing quality and supply chain reliability. The introduction of MMIOC-1M addresses a significant gap in machine learning infrastructure: while large-scale benchmarks exist for general computer vision tasks, industrial-specific datasets remain fragmented and limited in scope. With over one million samples spanning 14 super-categories and 29 industrial scenes, this benchmark enables researchers to develop and validate models specifically optimized for factory environments rather than relying on adaptation from generic vision datasets.
The technical innovation in RTVPNet reflects broader trends in visual-language model development. Rather than requiring manual prompts (points, boxes, masks) that introduce human bias, the system automatically generates refined visual prompts through energy-based sparse sampling. This automation reduces subjective noise and improves consistency in detection workflows. The expert-assisted domain projection mechanism further accelerates adaptation from general models to industrial contexts, addressing the time-to-deployment challenge that manufacturers face.
For industrial stakeholders, this work reduces the barrier to implementing AI-powered quality control. Manufacturers currently struggle with implementing vision systems due to dataset scarcity and model customization costs. A unified benchmark and open-source baseline enable faster adoption of automated defect detection, potentially reducing production downtime and improving product consistency. The framework supports both open-vocabulary detection (identifying novel defect types) and closed-set detection (known defect categories), providing flexibility across different manufacturing environments and quality assurance protocols.
- βMMIOC-1M is the first large-scale unified benchmark supporting both open-vocabulary and closed-set industrial defect detection with over one million samples.
- βRTVPNet eliminates manual prompt dependency through automated visual prompt generation, reducing subjective noise in industrial inspection workflows.
- βExpert-assisted domain projection enables rapid adaptation of general vision models to specific industrial manufacturing environments.
- βThe benchmark and code availability accelerate adoption of large-scale visual-language models for quality control across manufacturing sectors.
- βBidirectional text-visual interaction enhances cross-modal understanding for fine-grained defect classification and analysis.