y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#inference-control News & Analysis

2 articles tagged with #inference-control. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AINeutralarXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Inference-Time Toxicity Mitigation in Protein Language Models

Researchers developed Logit Diff Amplification (LDA) as an inference-time safety mechanism for protein language models to prevent toxic protein generation. The method reduces predicted toxicity rates while maintaining biological plausibility and structural viability, addressing dual-use safety concerns in AI-driven protein design.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

From Refusal Tokens to Refusal Control: Discovering and Steering Category-Specific Refusal Directions

Researchers developed a method to control AI safety refusal behavior using categorical refusal tokens in Llama 3 8B, enabling fine-grained control over when models refuse harmful versus benign requests. The technique uses steering vectors that can be applied during inference without additional training, improving both safety and reducing over-refusal of harmless prompts.

๐Ÿง  Llama