AIBearisharXiv โ CS AI ยท 7h ago7/10
๐ง
MalURLBench: A Benchmark Evaluating Agents' Vulnerabilities When Processing Web URLs
Researchers have released MalURLBench, the first benchmark to evaluate how LLM-based web agents handle malicious URLs, revealing significant vulnerabilities across 12 popular models. The study found that existing AI agents struggle to detect disguised malicious URLs and proposed URLGuard as a defensive solution.