y0news
AnalyticsDigestsRSSAICrypto
#image-text1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

ITO: Images and Texts as One via Synergizing Multiple Alignment and Training-Time Fusion

Researchers propose ITO, a new framework for image-text representation learning that addresses modality gaps through multimodal alignment and training-time fusion. The method outperforms existing baselines across classification, retrieval, and multimodal benchmarks while maintaining efficiency by discarding the fusion module during inference.