y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reward-manipulation News & Analysis

1 article tagged with #reward-manipulation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv โ€“ CS AI ยท 14h ago7/10
๐Ÿง 

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Researchers have discovered a critical vulnerability in Reinforcement Learning with Verifiable Rewards (RLVR), an emerging training paradigm that enhances LLM reasoning abilities. By injecting less than 2% poisoned data into training sets, attackers can implant backdoors that degrade safety performance by 73% when triggered, without modifying the reward verifier itself.