AINeutralarXiv โ CS AI ยท 4h ago6/10
๐ง
DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
The first LLM Testing competition at ICSE 2026's DeepTest workshop evaluated four tools designed to benchmark an LLM-based automotive assistant, focusing on their ability to identify failure cases where the system fails to surface critical safety warnings from car manuals. The competition assessed both the effectiveness of test discovery and the diversity of identified failures, establishing a benchmark for evaluating AI testing methodologies in safety-critical applications.