AINeutralarXiv – CS AI · 6h ago6/10
🧠
SkillResolve-Bench: Measuring and Resolving Same-Capability Ambiguity in Agent Skill Retrieval
Researchers introduce SkillResolve-Bench, a benchmark for evaluating agent skill retrieval systems that addresses the critical problem of selecting the correct skill variant when multiple capabilities are semantically similar. The benchmark includes 661 helper/risky skill pairs and proposes SkillResolve, a method that achieves safer procedural exposure by selecting appropriate skill representatives from capability families.