AIBullisharXiv – CS AI · 3h ago7/10
🧠
Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security
Researchers propose the Adversarial Prompt Disentanglement (APD) framework, a defense mechanism that identifies and neutralizes malicious components in LLM inputs before processing. The system combines semantic decomposition, graph-based intent classification, and transformer-based detection to reduce harmful outputs by over 85% while maintaining model performance.