🧠 AI⚪ NeutralImportance 6/10

AOR-Bench: Do Large Audio Language Models Over-Refuse Pseudo-Harmful Queries?

arXiv – CS AI|Jiaxi Yang, Chaewan Chun, Jason Lucas, Yuchen Yang, Dongwon Lee|June 23, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce AOR-Bench, the first benchmark measuring over-refusal in Large Audio Language Models (LALMs), where safety mechanisms incorrectly reject benign queries. Testing 12 models across six families reveals widespread over-refusal, particularly when audio context could disambiguate potentially harmful speech, prompting exploration of mitigation strategies like Chain-of-Thought reasoning.

Analysis

Large Audio Language Models face a critical safety alignment challenge that extends beyond traditional text-based AI systems. The problem of over-refusal—where safety mechanisms reject benign requests out of caution—represents a trade-off between preventing harm and maintaining usability. In audio domains, this issue becomes uniquely complex because speech meaning depends heavily on acoustic context and surrounding sounds that models may fail to process holistically.

This research addresses a gap in AI safety evaluation. While text-based language models have existing refusal benchmarks, audio presents distinct challenges due to the multimodal nature of spoken communication. Background sounds, vocal tone, and acoustic signatures can completely alter whether a query is genuinely harmful or innocent, yet current safeguards often operate on surface-level content matching rather than contextual understanding.

The findings carry significant implications for deploying LALMs in real-world applications. Over-refusal degrades user experience and utility, potentially limiting adoption in legitimate use cases like accessibility tools, education, and content analysis. Organizations deploying these models face pressure to balance safety compliance with functional performance, creating a tension between regulatory caution and practical effectiveness.

The proposed mitigation strategies—Chain-of-Thought prompting and activation steering—offer lightweight solutions without retraining. These approaches could help models better reason about context before making refusal decisions. As LALMs become more prevalent, establishing standardized benchmarks like AOR-Bench becomes essential for the industry to develop more nuanced safety mechanisms that distinguish between genuinely harmful and contextually benign requests.

Key Takeaways

→Over-refusal is widespread across major LALM families, indicating a systemic problem in current safety alignment approaches.
→Audio context—including background sounds and acoustic information—significantly affects whether speech should be considered harmful.
→Chain-of-Thought reasoning and activation steering show promise as lightweight mitigation strategies without requiring model retraining.
→AOR-Bench provides the first standardized evaluation tool for measuring over-refusal in audio models, filling a gap in AI safety research.
→Balancing safety constraints with usability remains critical for LALM deployment in real-world applications.