AIBearisharXiv โ CS AI ยท 8h ago7/10
๐ง
One Single Hub Text Breaks CLIP: Identifying Vulnerabilities in Cross-Modal Encoders via Hubness
Researchers have identified a critical vulnerability in CLIP and similar cross-modal encoders where a single hub text embedding can achieve similarity scores comparable to human-written captions across many unrelated images. This reveals fundamental weaknesses in how these models project text and images into shared embedding spaces, threatening the reliability of vision-language applications.