AINeutralarXiv – CS AI · 14h ago6/10
🧠
unix-ctf: Procedural Environments for Unix-Competence Reinforcement Learning
Researchers introduce unix-ctf, a procedural benchmark for evaluating Unix shell competence in AI agents through capture-the-flag tasks. The system demonstrates that Unix skills are trainable and separable from general programming ability, with fine-tuned models improving solve rates from 11.6% to 43.6% on diverse Unix challenges.