7 articles tagged with #jailbreak. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers developed SWhisper, a framework that uses near-ultrasonic audio to deliver covert jailbreak attacks against speech-driven AI systems. The technique is inaudible to humans but can successfully bypass AI safety measures with up to 94% effectiveness on commercial models.
AIBearisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers discovered that test-time reinforcement learning (TTRL) methods used to improve AI reasoning capabilities are vulnerable to harmful prompt injections that amplify both safety and harmfulness behaviors. The study shows these methods can be exploited through specially designed 'HarmInject' prompts, leading to reasoning degradation while highlighting the need for safer AI training approaches.
AIBearisharXiv โ CS AI ยท Mar 97/10
๐ง Researchers propose the Disentangled Safety Hypothesis (DSH) revealing that AI safety mechanisms in large language models operate on two separate axes - recognition ('knowing') and execution ('acting'). They demonstrate how this separation can be exploited through the Refusal Erasure Attack to bypass safety controls while comparing architectural differences between Llama3.1 and Qwen2.5.
๐ง Llama
AINeutralOpenAI News ยท Sep 57/106
๐ง OpenAI has launched a Bio Bug Bounty program inviting researchers to test GPT-5's safety protocols using universal jailbreak prompts. The program offers rewards up to $25,000 for successfully identifying vulnerabilities in the upcoming AI model's biological safety measures.
AINeutralarXiv โ CS AI ยท Mar 26/1014
๐ง Researchers introduce Jailbreak Foundry (JBF), a system that automatically converts AI jailbreak research papers into executable code modules for standardized testing. The system successfully reproduced 30 attacks with high accuracy and reduces implementation code by nearly half while enabling consistent evaluation across multiple AI models.
AINeutralOpenAI News ยท Jul 176/106
๐ง OpenAI has launched a Bio Bug Bounty program inviting researchers to test ChatGPT agent's safety mechanisms using universal jailbreak prompts. The program offers rewards up to $25,000 for identifying vulnerabilities in the AI system's safety protocols.
AIBearishOpenAI News ยท Apr 196/105
๐ง Large Language Models (LLMs) currently face significant security vulnerabilities from prompt injections and jailbreaks, where attackers can override the model's original instructions with malicious prompts. This highlights a critical weakness in current AI systems' ability to maintain instruction integrity and security.