AIBearisharXiv – CS AI · 10h ago7/10
🧠
Detecting Malicious Agent Skills in the Wild using Attention
Researchers developed Locate-and-Judge, a two-stage detection system that identifies malicious skill packages in LLM agent marketplaces by analyzing instruction-following attention patterns. The approach achieves order-of-magnitude cost reductions compared to direct LLM scanning while flagging dozens of live malicious skills, including those evading existing detection tools.