Harnessing Hyperbolic Geometry for Harmful Prompt Detection and Sanitization
Researchers propose HyPE and HyPS, a two-part defense framework using hyperbolic geometry to detect and neutralize harmful prompts in Vision-Language Models. The approach offers a lightweight, interpretable alternative to blacklist filters and classifier-based systems that are vulnerable to adversarial attacks.