y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#adversarial-examples News & Analysis

2 articles tagged with #adversarial-examples. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBearisharXiv โ€“ CS AI ยท Apr 207/10
๐Ÿง 

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

Researchers have discovered a critical vulnerability in Large Reasoning Models (LRMs) like DeepSeek R1 and OpenAI o4-mini that allows attackers to inject harmful content into the reasoning process while keeping final answers unchanged. The Psychology-based Reasoning-targeted Jailbreak Attack (PRJA) framework achieves an 83.6% success rate by exploiting semantic triggers and psychological principles, revealing a previously understudied safety gap in AI systems deployed in high-stakes domains.

๐Ÿข OpenAI
AIBearishOpenAI News ยท Feb 246/105
๐Ÿง 

Attacking machine learning with adversarial examples

Adversarial examples are specially crafted inputs designed to fool machine learning models into making incorrect predictions, functioning like optical illusions for AI systems. The article explores how these attacks work across different mediums and highlights the challenges in defending ML systems against such vulnerabilities.