AINeutralarXiv – CS AI · 6h ago6/10
🧠
CollabSkill: Evaluating Human-Agent Collaboration On Real-World Tasks
Researchers introduce CollabSkill, a framework for evaluating how AI agents perform when collaborating with real human workers on occupational tasks. Using data from 93 workers across 386 sessions, the study reveals that Claude Code outperforms Codex in practical collaboration scenarios—diverging from autonomous benchmark rankings—and identifies hands-on experience as the primary driver of effective human-AI teamwork.
🧠 Claude