y0news
AnalyticsDigestsSourcesRSSAICrypto
#jailbreak-prevention1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 4d ago7/102
๐Ÿง 

Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention

Researchers propose Intervened Preference Optimization (IPO) to address safety issues in Large Reasoning Models, where chain-of-thought reasoning contains harmful content even when final responses appear safe. The method achieves over 30% reduction in harmfulness while maintaining reasoning performance.