y0news
#defensive-bias1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 6h ago1
๐Ÿง 

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

A study reveals that safety-aligned large language models exhibit "Defensive Refusal Bias," refusing legitimate cybersecurity defense tasks 2.72x more often when they contain security-sensitive keywords. The research found particularly high refusal rates for critical defensive operations like system hardening (43.8%) and malware analysis (34.3%), suggesting current AI safety measures rely on semantic similarity rather than understanding intent.