y0news
AnalyticsDigestsSourcesRSSAICrypto
#weight-orthogonalization1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 4h ago7/10
๐Ÿง 

Understanding the Effects of Safety Unalignment on Large Language Models

Research reveals that two methods for removing safety guardrails from large language models - jailbreak-tuning and weight orthogonalization - have significantly different impacts on AI capabilities. Weight orthogonalization produces models that are far more capable of assisting with malicious activities while retaining better performance, though supervised fine-tuning can help mitigate these risks.