AINeutralarXiv – CS AI · 5h ago5/10
🧠
Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition
Researchers demonstrate that instruction-following audio language models can effectively utilize explicit acoustic cues for speech emotion recognition, with aligned acoustic tokens improving performance on standard benchmarks while remaining grounded in the underlying audio signal.