AINeutralarXiv – CS AI · 7h ago6/10
🧠
Layer-Isolated Evaluation: Gating the Deterministic Scaffold of a Production LLM Agent with a No-LLM, Regression-Locked Test Harness
Researchers present layer-isolated evaluation, a deterministic testing framework that decomposes LLM agents into eight functional layers, each validated independently without requiring LLM execution. Testing across 238 cases reveals that aggregate end-to-end metrics mask localized regressions, with targeted layer failures causing 25-91 percentage point drops in component-specific tests while barely affecting overall pass rates.