AINeutralarXiv – CS AI · 7h ago6/10
🧠
Low-Resource Safety Failures Are Action Failures, Not Representation Failures
Researchers discovered that large language models fail to refuse harmful requests in low-resource languages not because they lack the underlying safety representations, but because they cannot properly calibrate their safety decisions across languages. A recalibration approach using minimal target-language examples substantially improves refusal rates, suggesting safety alignment failures stem from decision calibration rather than representation gaps.
🧠 Llama