y0news
AnalyticsDigestsSourcesRSSAICrypto
#jailbreak-defense1 article
1 articles
AIBullisharXiv โ€“ CS AI ยท 17h ago6/10
๐Ÿง 

Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check

Researchers introduce Answer-Then-Check, a novel safety alignment approach for large language models that enables them to evaluate response safety before outputting to users. The method uses a new 80K-sample dataset called Reasoned Safety Alignment (ReSA) and demonstrates improved jailbreak defense while maintaining general reasoning capabilities.

๐Ÿข Hugging Face