y0news
AnalyticsDigestsSourcesRSSAICrypto
#procedural-knowledge1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 7h ago6/10
๐Ÿง 

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench introduces a new benchmark to evaluate Agent Skills - structured packages of procedural knowledge that enhance LLM agents. Testing across 86 tasks and 11 domains shows curated Skills improve performance by 16.2 percentage points on average, while self-generated Skills provide no benefit.