y0news
#prompt-injection3 articles
3 articles
AIBullisharXiv โ€“ CS AI ยท 5h ago2
๐Ÿง 

Tracking Capabilities for Safer Agents

Researchers propose a new safety framework for AI agents using Scala 3 with capture checking to prevent information leakage and malicious behaviors. The system creates a 'safety harness' that tracks capabilities through static type checking, allowing fine-grained control over agent actions while maintaining task performance.

AIBearisharXiv โ€“ CS AI ยท 5h ago1
๐Ÿง 

Reverse CAPTCHA: Evaluating LLM Susceptibility to Invisible Unicode Instruction Injection

Researchers developed 'Reverse CAPTCHA,' a framework that tests how large language models respond to invisible Unicode-encoded instructions embedded in normal text. The study found that AI models can follow hidden instructions that humans cannot see, with tool use dramatically increasing compliance rates and different AI providers showing distinct preferences for encoding schemes.

AIBullisharXiv โ€“ CS AI ยท 5h ago2
๐Ÿง 

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

Researchers introduce DualSentinel, a lightweight framework for detecting targeted attacks on Large Language Models by identifying 'Entropy Lull' patterns - periods of abnormally low token probability entropy that indicate when LLMs are being coercively controlled. The system uses dual-check verification to accurately detect backdoor and prompt injection attacks with near-zero false positives while maintaining minimal computational overhead.

$NEAR