y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#content-keying News & Analysis

1 article tagged with #content-keying. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv – CS AI · 10h ago7/10
🧠

Confidently Wrong: Severity-Aware Calibration of Prompt-Injection Detectors under Attack Shift

Researchers discovered that popular prompt-injection detectors (ProtectAI-v2 and Prompt-Guard-2) maintain extremely high confidence scores even when failing to catch attacks, particularly indirect behavior-hijack injections. Across multiple attack distribution shifts, detectors missed injections with 0.99-1.00 confidence while false-negative rates ranged from 1-97%, indicating a critical calibration failure that standard metrics fail to detect.