AIBullisharXiv โ CS AI ยท 5h ago
๐ง
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
Researchers introduce DCR (Discernment via Contrastive Refinement), a new method to reduce over-refusal in safety-aligned large language models. The approach helps LLMs better distinguish between genuinely toxic and seemingly toxic prompts, maintaining safety while improving helpfulness without degrading general capabilities.