y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#refusal-bias News & Analysis

1 article tagged with #refusal-bias. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv โ€“ CS AI ยท Mar 37/109
๐Ÿง 

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders

A study reveals that safety-aligned large language models exhibit "Defensive Refusal Bias," refusing legitimate cybersecurity defense tasks 2.72x more often when they contain security-sensitive keywords. The research found particularly high refusal rates for critical defensive operations like system hardening (43.8%) and malware analysis (34.3%), suggesting current AI safety measures rely on semantic similarity rather than understanding intent.