AIBullisharXiv โ CS AI ยท 5h ago
๐ง
Topological Alignment of Shared Vision-Language Embedding Space
Researchers introduce ToMCLIP, a new framework that improves multilingual vision-language models by using topological alignment to better preserve the geometric structure of shared embedding spaces. The method shows enhanced performance on zero-shot classification and multilingual image retrieval tasks.