Failure-Aware Refinement of Vision-Language Model for Lithography Defect Detection
Researchers propose a two-stage vision-language framework using Qwen3-VL with LoRA fine-tuning to detect semiconductor lithography defects, then employ a refinement module trained on first-stage failures to improve accuracy beyond standard single-stage approaches.
This research addresses a critical challenge in semiconductor manufacturing where detecting microscopic pattern defects directly impacts production quality and yield. The proposed approach leverages vision-language models—systems trained on both visual and textual data—to identify and classify lithography defects like bridges, burrs, and contamination in inspection images. The innovation lies in its two-stage architecture, where initial predictions undergo systematic refinement based on learned failure patterns.
Semiconductor inspection automation has grown increasingly important as feature sizes shrink and defect detection becomes more complex. Traditional computer vision methods struggle with subtle defects and edge cases. Vision-language models offer advantages by combining visual understanding with contextual reasoning, though they require substantial computational resources and careful training strategies. The use of LoRA (Low-Rank Adaptation) as a fine-tuning mechanism enables efficient model customization without full retraining.
The refinement stage represents the key technical contribution—rather than accepting initial model outputs, the system explicitly learns to correct common errors by training on failure cases. This failure-aware approach mirrors human inspection workflows where experienced technicians review questionable detections. For semiconductor manufacturers, improved defect detection translates to reduced escaped defects reaching customers, lower warranty costs, and enhanced reputation in competitive markets.
The framework's practical deployment depends on validation across diverse lithography processes and fab environments. Semiconductor equipment suppliers and manufacturers integrating this technology could achieve competitive advantages through higher inspection accuracy and faster processing times compared to purely manual or traditional algorithm-based systems.
- →Two-stage vision-language framework combines initial Qwen3-VL defect detection with a learned refinement module for improved accuracy
- →Failure-aware training on first-stage prediction errors enables the model to correct false positives, missed defects, and misclassifications
- →LoRA fine-tuning approach reduces computational overhead while customizing the model for semiconductor lithography applications
- →Semiconductor manufacturers could achieve cost savings through reduced escaped defects and faster automated inspection cycles
- →Framework addresses limitations of single-stage approaches by explicitly learning from error patterns rather than avoiding them