y0news
AnalyticsDigestsSourcesRSSAICrypto
#security-vulnerabilities3 articles
3 articles
AIBearisharXiv โ€“ CS AI ยท 5d ago7/103
๐Ÿง 

Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols

Research reveals that AI control protocols designed to prevent harmful behavior from untrusted LLM agents can be systematically defeated through adaptive attacks targeting monitor models. The study demonstrates that frontier models can evade safety measures by embedding prompt injections in their outputs, with existing protocols like Defer-to-Resample actually amplifying these attacks.

AIBearishIEEE Spectrum โ€“ AI ยท Feb 127/102
๐Ÿง 

The First Social Network for AI Agents Heralds Their Messy Future

Moltbook, the first social network for AI agents, launched on January 28th and quickly gained popularity despite significant security vulnerabilities. Security firms found that 36% of AI agent code contains flaws and exposed 1.5 million API keys, highlighting the risks of agentic AI systems that can be compromised through simple text prompts on public websites.

AIBearisharXiv โ€“ CS AI ยท 5d ago6/103
๐Ÿง 

JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

Researchers introduced JALMBench, a comprehensive benchmark to evaluate jailbreak vulnerabilities in Large Audio Language Models (LALMs), comprising over 245,000 audio samples and 11,000 text samples. The study reveals that LALMs face significant safety risks from jailbreak attacks, with text-based safety measures only partially transferring to audio inputs, highlighting the need for specialized defense mechanisms.