y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#cross-attention News & Analysis

4 articles tagged with #cross-attention. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · 5d ago6/10
🧠

MOSS-Video-Preview: Toward Real-Time Video Understanding via Cross-Attention

Researchers introduce MOSS-Video-Preview, a cross-attention architecture enabling real-time video understanding where models process frames continuously and revise answers as new information arrives. The approach achieves 5x speedup in time-to-first-token and 2.7x higher decoding throughput compared to decoder-only models, while maintaining competitive offline performance.

AINeutralarXiv – CS AI · May 116/10
🧠

Multimodal synthesis of MRI and tabular data with diffusion in a joint latent space via cross-attention

Researchers have developed a multimodal latent diffusion model that simultaneously synthesizes MRI brain scans and clinical tabular data (age, sex, body measurements) within a shared latent space using cross-attention mechanisms. Tested on over 10,000 participants from the German National Cohort, the system generates anatomically plausible synthetic medical data where image and tabular attributes remain coherently aligned, representing the first successful joint modeling of volumetric medical images with mixed-type clinical data.

AINeutralarXiv – CS AI · Apr 106/10
🧠

Multi-modal user interface control detection using cross-attention

Researchers have developed an enhanced version of YOLOv5 that combines visual and textual data through cross-attention mechanisms to improve UI control detection in software screenshots. Tested on over 16,000 annotated images across 23 control classes, the multi-modal approach significantly outperforms pixel-only detection, with convolutional fusion showing the strongest results for semantically complex elements.

AIBullisharXiv – CS AI · Mar 96/10
🧠

CASA: Cross-Attention over Self-Attention for Efficient Vision-Language Fusion

Researchers present CASA, a new approach using cross-attention over self-attention for vision-language models that maintains competitive performance while significantly reducing memory and compute costs. The method shows particular advantages for real-time applications like video captioning by avoiding expensive token insertion into language model streams.