AINeutralarXiv – CS AI · 10h ago6/10
🧠
All Green, Still Broken: Real-Flow Verification Lessons from an LLM-Integrated, Multi-Market Web Application
A production rental-search web application integrated with large language models and multi-market support accumulated 1,553 passing test cases over six weeks, yet defects continued reaching users. Analysis of 252 bug-fix commits revealed that 44% of failures occurred at integration seams—live browser runtime, non-default markets, end-to-end flows, and system-level interactions—that component-level unit tests cannot detect.