🧠 AI🟢 BullishImportance 6/10

ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

arXiv – CS AI|Quoc-Khang Tran, Minh-Thien Nguyen, Nguyen-Khang Pham|February 27, 2026 at 05:00 AM|6 views

🤖AI Summary

Researchers introduced ViCLIP-OT, the first foundation vision-language model specifically designed for Vietnamese image-text retrieval. The model integrates CLIP-style contrastive learning with Similarity-Graph Regularized Optimal Transport (SIGROT) loss, achieving significant improvements over existing baselines with 67.34% average Recall@K on UIT-OpenViIC benchmark.

Key Takeaways

→ViCLIP-OT is the first foundation vision-language model specifically optimized for Vietnamese image-text retrieval tasks.
→The model improves upon CLIP by 5.75 percentage points on UIT-OpenViIC and 11.72 percentage points in zero-shot evaluation on Crossmodal-3600.
→SIGROT loss integration enhances global cross-modal consistency and reduces modality gap issues in low-resource language settings.
→Extensive testing on three Vietnamese benchmarks demonstrates consistent outperformance in both in-domain and zero-shot scenarios.
→The approach provides a scalable strategy for cross-modal retrieval systems in underrepresented linguistic contexts beyond Vietnamese.

#vision-language-models #vietnamese-ai #image-text-retrieval #clip #optimal-transport #low-resource-languages #cross-modal #foundation-models #multimedia-ai

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AI7h ago

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AI8h ago

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AI16h ago

ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge