AIBullisharXiv – CS AI · 14h ago6/10
🧠
Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content
Researchers introduce Opir, a family of efficient encoder-based safety classification models designed to detect toxic content, jailbreaks, and harmful prompts in LLM applications without requiring expensive large guardrail models. The models achieve competitive performance across 12 safety tasks against eight contemporary systems while maintaining significantly smaller deployment footprints, with edge variants containing fewer than 100M parameters.