y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-jailbreak News & Analysis

2 articles tagged with #model-jailbreak. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBearisharXiv – CS AI · May 127/10
🧠

LLM-Agnostic Semantic Representation Attack

Researchers have developed Semantic Representation Attack (SRA), a novel adversarial technique that bypasses LLM safety mechanisms by targeting semantic meaning rather than specific text patterns. The method achieves 99.71% attack success rates across 26 open-source models with strong cross-model transferability, raising significant security concerns for deployed AI systems.

AIBearisharXiv – CS AI · Apr 147/10
🧠

Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

Researchers have discovered a critical vulnerability in Reinforcement Learning with Verifiable Rewards (RLVR), an emerging training paradigm that enhances LLM reasoning abilities. By injecting less than 2% poisoned data into training sets, attackers can implant backdoors that degrade safety performance by 73% when triggered, without modifying the reward verifier itself.