AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Researchers developed an unbiased sliced Wasserstein RBF kernel with rotary positional embedding to improve audio captioning systems by addressing exposure bias and temporal relationship issues. The method shows significant improvements in caption quality and text-to-audio retrieval accuracy on AudioCaps and Clotho datasets, while also enhancing audio reasoning capabilities in large language models.