DiagramRAG: A Lightweight Framework to Retrieve Scientific Diagram for Figure Generation
DiagramRAG is a new retrieval-augmented framework that converts rough sketches into publication-quality scientific diagrams by retrieving semantically and topologically compatible reference diagrams. The system achieves strong performance metrics (F1-scores of 0.848 and 0.802 on benchmark datasets) while maintaining efficient inference at 35.48 seconds per sample.
DiagramRAG addresses a genuine workflow bottleneck in academic publishing: the gap between researchers' initial conceptual sketches and the polished diagrams required for publication. Traditional sketch-based generation merely reconstructs the input, while text-driven approaches ignore structural information embedded in visual layouts. This work bridges that gap by treating diagram generation as a retrieval-augmented problem, where reference diagrams serve as both inspiration and constraint.
The technical approach demonstrates sophistication in how it handles the dual challenges of semantic and structural matching. By representing diagrams as knowledge graphs and training an embedding model to align sketches with compatible references across different simplification levels, the framework creates a more intelligent retrieval mechanism than simple content matching. This design choice reflects a deeper understanding that scientific diagrams encode topological relationships that matter as much as their visual content.
The benchmark results validate the approach's effectiveness, with VLM-as-a-Judge scores of 7.170 suggesting meaningful improvements in generation quality beyond traditional metrics. The inference latency of under 36 seconds positions this as practical for iterative research workflows where academics frequently refine figures. The availability of code and datasets on Hugging Face facilitates broader adoption and research building.
The framework's impact extends beyond diagram generation specifically—it demonstrates how retrieval augmentation can enhance creative AI tasks by providing structural priors rather than purely generative approaches. This methodology could inform similar tools in scientific visualization, technical illustration, and domain-specific design automation where both semantic and structural constraints matter.
- →DiagramRAG combines semantic and topological matching to improve scientific diagram generation from sketches.
- →The system achieves F1-scores of 0.848 and 0.802 on major benchmarks with practical 35-second inference time.
- →Knowledge graph representation of diagrams enables structure-aware retrieval that outperforms text-only or sketch-only approaches.
- →Framework is open-sourced with code and datasets available, enabling research community adoption and extension.
- →Demonstrates retrieval-augmented generation's effectiveness for constrained creative tasks requiring both content and structure fidelity.