AINeutralarXiv – CS AI · 18h ago6/10
🧠
Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding
Researchers introduce Conan-embedding-v3, a framework that enables unified embedding spaces across multiple data modalities (text, image, video, audio, documents) by training specialized models independently and fusing them into a single backbone. The approach identifies and solves a critical technical challenge called 'Projector Drift' that causes audio retrieval performance degradation when external encoders are integrated.