SpatialGrammar: A Domain-Specific Language for LLM-Based 3D Indoor Scene Generation
Researchers introduce SpatialGrammar, a domain-specific language designed to improve LLM-based 3D indoor scene generation by representing layouts as bird's-eye-view grid placements with compiler validation. The approach, paired with SG-Agent (an iterative refinement system) and SG-Mini (a 104M-parameter model), significantly reduces spatial errors and collision issues that plague existing natural language-to-3D scene generation methods.
SpatialGrammar addresses a fundamental challenge in AI: bridging the gap between how language models understand spatial relationships and the geometric constraints required for valid 3D environments. Traditional approaches rely on raw coordinates or verbose code, forcing LLMs to infer complex spatial logic without explicit constraint enforcement. This research proposes a structured intermediate representation—a domain-specific language that encodes 3D layouts as bird's-eye-view grid placements, enabling deterministic compilation to valid geometry with built-in constraint checking.
The innovation extends beyond representation design. SG-Agent implements a closed-loop feedback mechanism where compiler outputs guide iterative refinement, allowing the model to learn from constraint violations rather than generating invalid scenes outright. This mirrors how human designers work: proposing layouts, validating against constraints, then adjusting. Meanwhile, SG-Mini demonstrates that smaller models (104M parameters) trained on compiler-validated synthetic data can match or exceed larger LLM baselines, suggesting efficiency gains for deployment.
For the AI ecosystem, this work signals a broader trend: purpose-built intermediate languages and constraint systems increasingly mediate between LLMs and specialized domains. The implications extend to embodied AI, gaming, and virtual reality development, where automatic scene generation could accelerate content creation. The results across 159 test scenes show measurable improvements in spatial fidelity and physical plausibility, validating the approach's practical viability. Looking forward, similar domain-specific language strategies could optimize LLM performance in robotics planning, CAD design, and other domains requiring precise spatial reasoning.
- →Domain-specific languages with compiler feedback enable LLMs to generate spatially-valid 3D scenes by constraining outputs during generation rather than correcting errors post-hoc.
- →SG-Mini's competitive performance at 104M parameters suggests smaller, specialized models can outperform larger general-purpose LLMs when paired with appropriate inductive biases.
- →The closed-loop refinement system demonstrates iterative improvement via constraint violations, a pattern applicable to other spatially-constrained AI tasks.
- →Automatic 3D scene generation from natural language reduces manual content creation workload for gaming, VR, and embodied AI applications.
- →Compiler-validated synthetic training data proves effective for developing robust models without requiring expensive human-annotated 3D scene datasets.