AINeutralarXiv – CS AI · 10h ago6/10
🧠
Scalable Hierarchical Attention Transformers for Multi-Turn Jailbreak Detection in Long Conversations
Researchers introduce a hierarchical attention transformer that detects multi-turn jailbreak attempts in long conversations by analyzing dialogue patterns rather than processing entire transcripts at once. The model achieves 93.94% F1 score, surpassing Claude Opus while reducing false positives by 50%, addressing a critical gap in AI safety systems that process conversations turn-by-turn.
🧠 Claude🧠 Opus