AIBullisharXiv โ CS AI ยท 5h ago1
๐ง
CAPT: Confusion-Aware Prompt Tuning for Reducing Vision-Language Misalignment
Researchers propose CAPT, a Confusion-Aware Prompt Tuning framework that addresses systematic misclassifications in vision-language models like CLIP by learning from the model's own confusion patterns. The method uses a Confusion Bank to model persistent category misalignments and introduces specialized modules to capture both semantic and sample-level confusion cues.