AINeutralarXiv – CS AI · 7h ago6/10
🧠
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
Researchers introduce ReasoningGuard, an inference-time safety mechanism designed to protect Large Reasoning Models from generating harmful content during their reasoning processes. The method uses internal attention mechanisms to inject safety-oriented reflections at critical points, mitigating jailbreak attacks without requiring costly fine-tuning and outperforming nine existing safeguards.