AINeutralarXiv โ CS AI ยท 10h ago7/10
๐ง
WebCoderBench: Benchmarking Web Application Generation with Comprehensive and Interpretable Evaluation Metrics
Researchers introduced WebCoderBench, the first comprehensive benchmark for evaluating web application generation by large language models, featuring 1,572 real-world user requirements and 24 evaluation metrics. The benchmark tests 12 representative LLMs and shows no single model dominates across all metrics, providing opportunities for targeted improvements.