AIBullisharXiv – CS AI · 7h ago7/10
🧠
MOSS-Audio Technical Report
MOSS-Audio is a unified audio-language model supporting speech, environmental sound, and music understanding with capabilities in captioning, question answering, and temporal grounding. The model introduces DeepStack cross-layer feature injection and time markers for explicit temporal cues, released in 4B and 8B variants for instruction-following and reasoning tasks.