RTL-BenchLS: A Large-Scale Benchmark for RTL Reasoning and Generation with Large Language Models
Researchers introduce RTL-BenchLS, a large-scale benchmark containing over 10,000 formally verified Verilog designs for evaluating large language models on hardware design tasks. The benchmark addresses limitations of existing datasets through three novel self-supervised tasks beyond specification-to-RTL generation, with top models achieving only 12-28% accuracy, demonstrating substantial room for improvement in LLM-based hardware automation.