Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing
Researchers introduce PostEDA-Bench, a hierarchical benchmark for evaluating LLM-based agents in Electronic Design Automation tasks, specifically targeting Design Rule Check (DRC) fixing and Power-Performance-Area (PPA) optimization. Testing eight LLMs across 145 tasks reveals significant performance gaps, with best success rates of 36.66% for complex DRC reasoning and only 20% for multi-objective PPA optimization, indicating substantial room for improvement in AI-assisted chip design automation.
PostEDA-Bench addresses a critical gap in AI benchmarking by focusing on the final, labor-intensive stages of semiconductor design where LLM-based agents are increasingly deployed. Unlike prior EDA benchmarks that oversimplified evaluation through flat hierarchies and single toolchains, this work introduces a realistic hierarchical framework with machine-checkable validation across commercial and open-source tools. The benchmark reveals a stark performance cliff between synthetic tasks and real-world scenarios, with LLMs struggling most when required to balance competing design constraints in PPA-Multi scenarios where success rates plummet to 20%.
The semiconductor industry has long sought automation for the "last mile" of circuit design—the time-consuming validation and optimization phase that consumes substantial engineering resources. As LLM capabilities expand, the pressure to apply them to EDA workflows intensifies, yet this research demonstrates current models lack the reasoning sophistication required for production-grade automation. The finding that trade-off reasoning, rather than domain knowledge, represents the primary bottleneck suggests the limitation isn't factual understanding of design rules but rather multi-dimensional optimization and constraint satisfaction—cognitive tasks that remain challenging for current LLM architectures.
This work carries implications for semiconductor tool vendors, chip design teams, and AI researchers developing specialized LLMs. Chip design companies cannot yet rely on off-the-shelf LLM agents for critical design closure tasks, maintaining continued demand for traditional EDA tool vendors and specialized engineers. The benchmark provides the research community with a rigorous evaluation framework to drive progress in reasoning-intensive domains, potentially spurring development of task-specific LLM variants optimized for hierarchical constraint satisfaction and multi-objective optimization problems inherent to semiconductor design.
- →PostEDA-Bench introduces the first hierarchical benchmark for DRC fixing and PPA convergence, revealing significant performance gaps in LLM-based EDA agents.
- →Best-in-class LLMs achieve only 36.66% success on complex DRC reasoning and 20% on multi-objective PPA optimization, indicating current models are insufficient for production chip design.
- →Vision augmentation consistently improves DRC task performance, suggesting multimodal approaches may unlock better results in EDA automation.
- →Trade-off reasoning rather than design knowledge is the primary bottleneck in PPA-Multi tasks, indicating reasoning capability limitations in current LLMs.
- →Existing EDA-LLM benchmarks have been oversimplified, creating false confidence in LLM readiness for real-world semiconductor design workflows.