#adversarial-examples Articles

#adversarial-examples News & Analysis

2 articles tagged with #adversarial-examples. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBearisharXiv – CS AI · Apr 207/10

🧠

Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing

Researchers have discovered a critical vulnerability in Large Reasoning Models (LRMs) like DeepSeek R1 and OpenAI o4-mini that allows attackers to inject harmful content into the reasoning process while keeping final answers unchanged. The Psychology-based Reasoning-targeted Jailbreak Attack (PRJA) framework achieves an 83.6% success rate by exploiting semantic triggers and psychological principles, revealing a previously understudied safety gap in AI systems deployed in high-stakes domains.

🏢 OpenAI

AIBearishOpenAI News · Feb 246/105

🧠

Attacking machine learning with adversarial examples

Adversarial examples are specially crafted inputs designed to fool machine learning models into making incorrect predictions, functioning like optical illusions for AI systems. The article explores how these attacks work across different mediums and highlights the challenges in defending ML systems against such vulnerabilities.