y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#jailbreak News & Analysis

7 articles tagged with #jailbreak. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AIBearisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Sirens' Whisper: Inaudible Near-Ultrasonic Jailbreaks of Speech-Driven LLMs

Researchers developed SWhisper, a framework that uses near-ultrasonic audio to deliver covert jailbreak attacks against speech-driven AI systems. The technique is inaudible to humans but can successfully bypass AI safety measures with up to 94% effectiveness on commercial models.

AIBearisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Researchers discovered that test-time reinforcement learning (TTRL) methods used to improve AI reasoning capabilities are vulnerable to harmful prompt injections that amplify both safety and harmfulness behaviors. The study shows these methods can be exploited through specially designed 'HarmInject' prompts, leading to reasoning degradation while highlighting the need for safer AI training approaches.

AIBearisharXiv โ€“ CS AI ยท Mar 97/10
๐Ÿง 

Knowing without Acting: The Disentangled Geometry of Safety Mechanisms in Large Language Models

Researchers propose the Disentangled Safety Hypothesis (DSH) revealing that AI safety mechanisms in large language models operate on two separate axes - recognition ('knowing') and execution ('acting'). They demonstrate how this separation can be exploited through the Refusal Erasure Attack to bypass safety controls while comparing architectural differences between Llama3.1 and Qwen2.5.

๐Ÿง  Llama
AINeutralOpenAI News ยท Sep 57/106
๐Ÿง 

GPT-5 bio bug bounty call

OpenAI has launched a Bio Bug Bounty program inviting researchers to test GPT-5's safety protocols using universal jailbreak prompts. The program offers rewards up to $25,000 for successfully identifying vulnerabilities in the upcoming AI model's biological safety measures.

AINeutralarXiv โ€“ CS AI ยท Mar 26/1014
๐Ÿง 

Jailbreak Foundry: From Papers to Runnable Attacks for Reproducible Benchmarking

Researchers introduce Jailbreak Foundry (JBF), a system that automatically converts AI jailbreak research papers into executable code modules for standardized testing. The system successfully reproduced 30 attacks with high accuracy and reduces implementation code by nearly half while enabling consistent evaluation across multiple AI models.

AINeutralOpenAI News ยท Jul 176/106
๐Ÿง 

Agent bio bug bounty call

OpenAI has launched a Bio Bug Bounty program inviting researchers to test ChatGPT agent's safety mechanisms using universal jailbreak prompts. The program offers rewards up to $25,000 for identifying vulnerabilities in the AI system's safety protocols.

AIBearishOpenAI News ยท Apr 196/105
๐Ÿง 

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Large Language Models (LLMs) currently face significant security vulnerabilities from prompt injections and jailbreaks, where attackers can override the model's original instructions with malicious prompts. This highlights a critical weakness in current AI systems' ability to maintain instruction integrity and security.