AIBearisharXiv โ CS AI ยท 6h ago1
๐ง
Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
A study reveals that safety-aligned large language models exhibit "Defensive Refusal Bias," refusing legitimate cybersecurity defense tasks 2.72x more often when they contain security-sensitive keywords. The research found particularly high refusal rates for critical defensive operations like system hardening (43.8%) and malware analysis (34.3%), suggesting current AI safety measures rely on semantic similarity rather than understanding intent.