🧠 AI⚪ NeutralImportance 6/10

DAR: Deontic Reasoning with Agentic Harnesses

arXiv – CS AI|Guangyao Dou, William Jurayj, Nils Holzenberger, Benjamin Van Durme|June 4, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Deontic Agentic Reasoning (DAR), a new framework that enables large language models to better tackle complex rule-based reasoning tasks by dynamically querying statutes and policies. Testing on DeonticBench shows agentic approaches improve performance on hard cases, though weaker models struggle with numerical reasoning and consume significantly more tokens.

Analysis

Deontic reasoning—the application of explicit rules to specific facts—represents a critical frontier for AI systems that must operate in regulated environments like tax computation and legal appeals. The DAR framework addresses a fundamental limitation of current LLMs: their difficulty navigating lengthy, cross-referenced rulesets without making errors or omitting relevant provisions. By enabling models to query statutes on demand rather than relying on preloaded context, the approach mirrors how human experts actually consult legal documents during reasoning tasks.

This research builds on broader efforts to improve LLM reliability in high-stakes domains where incorrect reasoning carries real consequences. Previous work showed that standard prompting techniques often fail when dealing with complex regulatory frameworks, yet fine-tuning on specific domains remains expensive and inflexible. The agentic harness methodology—allowing models to actively retrieve information—provides a more scalable alternative that can adapt to different legal or regulatory systems.

The findings carry mixed implications for deployment. While stronger models benefit from the agentic approach, weaker models show degradation on numerical tasks—a critical concern for financial and tax applications where precision is non-negotiable. The substantial token consumption overhead also raises questions about cost-effectiveness at scale, particularly for organizations processing high volumes of regulatory determinations. These tradeoffs suggest DAR works best when paired with sufficiently capable base models and when computational budgets accommodate higher token usage.

Future work should focus on optimizing token efficiency and identifying which model architectures best support agentic reasoning under computational constraints. Real-world deployment will require careful model selection and threshold-setting to avoid the degradation observed in weaker variants.

Key Takeaways

→DAR enables LLMs to query statutes dynamically rather than rely on static context, improving performance on complex rule-based reasoning tasks.
→Stronger models benefit significantly from agentic harnesses, while weaker models often degrade on numerical reasoning despite improvements elsewhere.
→Token consumption increases substantially with agentic approaches, raising cost and efficiency concerns for large-scale regulatory applications.
→The framework addresses a real constraint in deploying AI to high-stakes domains like tax and immigration law where ruleset complexity exceeds typical model context limits.
→Model selection becomes critical—not all LLMs are suitable for agentic deontic reasoning, with performance gains concentrated in stronger variants.