AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
SkillsBench introduces a new benchmark to evaluate Agent Skills - structured packages of procedural knowledge that enhance LLM agents. Testing across 86 tasks and 11 domains shows curated Skills improve performance by 16.2 percentage points on average, while self-generated Skills provide no benefit.