y0news
AnalyticsDigestsRSSAICrypto
#dataset-corruption1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Understanding and Mitigating Dataset Corruption in LLM Steering

Research reveals that contrastive steering, a method for adjusting LLM behavior during inference, is moderately robust to data corruption but vulnerable to malicious attacks when significant portions of training data are compromised. The study identifies geometric patterns in corruption types and proposes using robust mean estimators as a safeguard against unwanted effects.