AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern
Researchers introduce DualSentinel, a lightweight framework for detecting targeted attacks on Large Language Models by identifying 'Entropy Lull' patterns - periods of abnormally low token probability entropy that indicate when LLMs are being coercively controlled. The system uses dual-check verification to accurately detect backdoor and prompt injection attacks with near-zero false positives while maintaining minimal computational overhead.
$NEAR