#audio-understanding News & Analysis

4 articles tagged with #audio-understanding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

MOSS-Audio Technical Report

MOSS-Audio is a unified audio-language model supporting speech, environmental sound, and music understanding with capabilities in captioning, question answering, and temporal grounding. The model introduces DeepStack cross-layer feature injection and time markers for explicit temporal cues, released in 4B and 8B variants for instruction-following and reasoning tasks.

AINeutralarXiv – CS AI · May 287/10

🧠

KVoiceBench, KOpenAudioBench, and KMMAU: Agent-Driven Korean Speech Benchmarks for Evaluating SpeechLMs

Researchers introduce three new Korean speech benchmarks (KVoiceBench, KOpenAudioBench, and KMMAU) totaling 12,345 samples to evaluate multilingual speech language models, addressing the gap in non-English evaluation. The study reveals significant performance disparities between English and Korean across eight SpeechLMs, exposing weaknesses invisible to English-only testing.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

Researchers introduce Audio Flamingo Next (AF-Next), an advanced open-source audio-language model that processes speech, sound, and music with support for inputs up to 30 minutes. The model incorporates a new temporal reasoning approach and demonstrates competitive or superior performance compared to larger proprietary alternatives across 20 benchmarks.

AINeutralarXiv – CS AI · May 296/10

🧠

MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs

Researchers introduce MusTBENCH, a benchmark for evaluating temporal grounding capabilities in Large Audio-Language Models (LALMs) for music understanding, and propose MusT, an optimization framework that significantly improves model performance on time-sensitive musical tasks like instrument entries and rhythmic transitions.