AIBullisharXiv โ CS AI ยท 10h ago4/10
๐ง
LAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence
Researchers have developed LAMB, a new AI framework that improves automated audio captioning by better aligning audio features with large language models through Cauchy-Schwarz divergence optimization. The system achieved state-of-the-art performance on AudioCaps dataset by bridging the modality gap between audio and text embeddings.