10 articles tagged with #multimodal-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 1d ago7/10
๐ง Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers developed a new disentangled multi-modal framework that combines histopathology and transcriptome data for improved cancer diagnosis and prognosis. The framework addresses key challenges in medical AI including multi-modal data heterogeneity and dependency on paired datasets through innovative fusion techniques and knowledge distillation strategies.
AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง Researchers introduce MODIX, a training-free framework that dynamically optimizes how Vision-Language Models allocate attention across multimodal inputs by adjusting positional encoding based on information density rather than uniform token assignment. The approach improves reasoning performance without modifying model parameters, suggesting positional encoding should be treated as an adaptive resource in multimodal transformer architectures.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce CFMS, a two-stage framework combining multimodal large language models with symbolic reasoning to improve tabular data comprehension for question answering and fact verification tasks. The approach achieves competitive results on WikiTQ and TabFact benchmarks while demonstrating particular robustness with large tables and smaller model architectures.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce Cross-lingual Speech Language Models (CSLM), an efficient training method for building multilingual speech AI systems using discrete speech tokens. The approach achieves cross-modal and cross-lingual alignment through continual pre-training and instruction fine-tuning, enabling effective speech LLMs without requiring massive datasets.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง OmniPrism introduces a new visual concept disentanglement approach for AI image generation that separates multiple visual aspects (content, style, composition) to enable more controlled and creative outputs. The method uses a contrastive training pipeline and a new 200K paired dataset to train diffusion models that can incorporate disentangled concepts while maintaining fidelity to text prompts.
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers introduce SmartCLIP, a new AI model that improves upon CLIP by addressing information misalignment issues between images and text through modular vision-language alignment. The approach enables better disentanglement of visual representations while preserving cross-modal semantic information, demonstrating superior performance across various tasks.
AIBullisharXiv โ CS AI ยท Apr 66/10
๐ง Researchers introduce Contrastive Fusion (ConFu), a new multimodal machine learning framework that aligns individual modalities and their fused combinations in a unified representation space. The approach captures higher-order dependencies between multiple modalities while maintaining strong pairwise relationships, demonstrating competitive performance on retrieval and classification tasks.
AIBullisharXiv โ CS AI ยท Mar 66/10
๐ง Researchers introduce DP-MTV, the first framework enabling privacy-preserving multimodal in-context learning for vision-language models using differential privacy. The system allows processing hundreds of demonstrations while maintaining formal privacy guarantees, achieving competitive performance on benchmarks like VizWiz with only minimal accuracy loss.
AINeutralarXiv โ CS AI ยท Mar 26/1017
๐ง Researchers conducted a systematic benchmark study on multimodal fusion between Electronic Health Records (EHR) and chest X-rays for clinical decision support, revealing when and how combining data modalities improves healthcare AI performance. The study found that multimodal fusion helps when data is complete but benefits degrade under realistic missing data scenarios, and released an open-source benchmarking toolkit for reproducible evaluation.