y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#refusal News & Analysis

2 articles tagged with #refusal. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv – CS AI · Apr 107/10
🧠

Blind Refusal: Language Models Refuse to Help Users Evade Unjust, Absurd, and Illegitimate Rules

Researchers document 'blind refusal'—a phenomenon where safety-trained language models refuse to help users circumvent rules without evaluating whether those rules are legitimate, unjust, or have justified exceptions. The study shows models refuse 75.4% of requests to break rules even when the rules lack defensibility and pose no safety risk.

🧠 GPT-5
AIBullishOpenAI News · Aug 77/106
🧠

From hard refusals to safe-completions: toward output-centric safety training

OpenAI introduces a new 'safe-completions' approach in GPT-5 that moves beyond simple refusals to provide nuanced, helpful responses while maintaining safety standards. This output-centric safety training method better handles dual-use prompts by generating contextually appropriate completions rather than blanket rejections.