Analytics Digests Sources Topics RSS AI Crypto

#reward-manipulation News & Analysis

1 article tagged with #reward-manipulation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles

AIBearisharXiv – CS AI · Apr 147/10

🧠

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Researchers have discovered a critical vulnerability in Reinforcement Learning with Verifiable Rewards (RLVR), an emerging training paradigm that enhances LLM reasoning abilities. By injecting less than 2% poisoned data into training sets, attackers can implant backdoors that degrade safety performance by 73% when triggered, without modifying the reward verifier itself.