AINeutralarXiv – CS AI · 15h ago6/10
🧠
It's Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers
A controlled study of 432 experiments across six LLM models challenges the assumption that higher-capability models require less structural guidance. The research reveals non-monotone harness sensitivity patterns, where frontier models like Gemini 2.5 Flash show performance degradation with increased harness complexity, while reasoning-focused models benefit from stricter constraints.
🧠 Gemini