y0news
AnalyticsDigestsSourcesRSSAICrypto
#image-text-retrieval1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท Feb 276/106
๐Ÿง 

ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

Researchers introduced ViCLIP-OT, the first foundation vision-language model specifically designed for Vietnamese image-text retrieval. The model integrates CLIP-style contrastive learning with Similarity-Graph Regularized Optimal Transport (SIGROT) loss, achieving significant improvements over existing baselines with 67.34% average Recall@K on UIT-OpenViIC benchmark.