y0news
#multimodal-models1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 6h ago1
๐Ÿง 

AudioCapBench: Quick Evaluation on Audio Captioning across Sound, Music, and Speech

Researchers introduce AudioCapBench, a new benchmark for evaluating how well large multimodal AI models can generate captions for audio content across sound, music, and speech domains. The study tested 13 models from OpenAI and Google Gemini, finding that Gemini models generally outperformed OpenAI in overall captioning quality, though all models struggled most with music captioning.