y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#jailbreak-analysis News & Analysis

1 article tagged with #jailbreak-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 18h ago6/10
🧠

Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks

Researchers applied process mining techniques to red team attack logs against large language models, revealing that standard attack success rate metrics mask critical differences in how models defend themselves. GPT-OSS 120B exhibits a near-absorbing refusal state, while Llama 3.3 70B shows multiple escape routes from refusal, with mutator effectiveness varying significantly across models.

🧠 Llama