AIBullisharXiv – CS AI · 9h ago7/10
🧠
InvThink: Premortem Reasoning for Safer Language Models
InvThink introduces a three-step framework that enhances language model safety by requiring models to enumerate potential harms, analyze consequences, and generate responses under explicit mitigation constraints. The method demonstrates superior safety performance at larger model scales while preserving reasoning capabilities, achieving up to 32% reduction in harmful outputs compared to baseline approaches.