y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#activation-scaling News & Analysis

1 article tagged with #activation-scaling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv โ€“ CS AI ยท 4h ago7/10
๐Ÿง 

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

Researchers introduce ASGuard, a mechanistically-informed framework that identifies and mitigates vulnerabilities in large language models' safety mechanisms, particularly those exploited by targeted jailbreaking attacks like tense-changing prompts. By using circuit analysis to locate vulnerable attention heads and applying channel-wise scaling vectors, ASGuard reduces attack success rates while maintaining model utility and general capabilities.