From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing
Researchers introduce NormBench, a benchmark with 2,290 legal provisions across multiple languages, and Span-Grounded Deontic Trees (SG-DT), a structured representation method designed to address Silent Scope Omission—where AI systems appear compliant but fail to apply nested exceptions correctly. Testing reveals that frontier LLMs struggle with recursive defeater chains and struggle to assemble correct logical control flow despite retrieving relevant source material.
This research addresses a critical failure mode in AI systems tasked with interpreting legal and policy documents. Silent Scope Omission represents a subtle yet dangerous vulnerability: outputs appear correct at first glance but silently drop edge cases and exceptions, creating compliance risks for organizations deploying AI in regulated environments. Rather than framing this as purely an agentic-systems problem, the authors correctly identify the root cause as deficient statutory understanding at the NLP layer.
The paper emerges from the intersection of legal AI and interpretability concerns. Existing legal NLP benchmarks focus on end-task performance—determining outcomes correctly—but fail to measure whether models actually understand the hierarchical structure of defeaters and exceptions. NormBench introduces a compiler-style representation (SG-DT) that forces explicit reasoning about which clauses override which, making the control flow auditable and deterministic.
The identified pathologies—Recursion Decay and the Auditability Trap—have immediate implications for compliance automation. Recursion Decay shows that as exception nesting deepens, model accuracy degrades sharply, suggesting AI systems may be unreliable for complex regulatory frameworks. The Auditability Trap is particularly revealing: models retrieve correct source material but fail to assemble the logical dependencies, indicating the problem lies in structural reasoning rather than information retrieval.
Using SG-DT as a constrained intermediate representation improves performance on exception-heavy cases, though gains are mechanism-specific and depend on parser fidelity. This suggests that intermediate structured representations can mitigate—but not eliminate—fundamental limitations in LLM reasoning over legal text.
- →Silent Scope Omission is a critical failure mode where AI systems miss nested exceptions in legal documents despite appearing compliant
- →Frontier LLMs exhibit Recursion Decay, with sharply declining accuracy as defeater depth increases in policy hierarchies
- →Span-Grounded Deontic Trees enable deterministic, auditable compilation of legal control flow from source text
- →Models can retrieve relevant clauses but fail to assemble correct logical dependencies between them
- →Structured intermediate representations improve performance on exception-heavy regulatory language but not uniformly across all tasks