🧠 AI⚪ NeutralImportance 6/10

From RAG to Agentic RAG for Faithful Islamic Question Answering

arXiv – CS AI|Gagan Bhatia, Hamdy Mubarak, Mustafa Jarrar, George Mikros, Fadi Zaraket, Mahmoud Alhirthani, Mutaz Al-Khatib, Logan Cochrane, Kareem Darwish, Rashid Yahiaoui, Firoj Alam|June 23, 2026 at 04:00 AM

🤖AI Summary

Researchers introduced IslamicFaithQA, a 3,810-item bilingual benchmark and agentic RAG framework designed to improve the accuracy and reliability of Islamic question-answering systems. The work addresses critical gaps in LLM evaluation by measuring hallucination rates and abstention capabilities, achieving state-of-the-art performance through iterative evidence-seeking mechanisms grounded in Qur'anic text.

Analysis

This research tackles a consequential problem at the intersection of AI safety and religious applications. Islamic question-answering systems present unique challenges because incorrect responses can carry serious theological and practical implications for believers. Traditional MCQ and machine reading comprehension evaluations fail to capture real-world failure modes like free-form hallucinations or the system's ability to decline answering when evidence is insufficient—both critical for religious applications.

The work represents a methodological advance in building trustworthy AI systems for specialized domains. By developing IslamicFaithQA with atomic single-gold answers, the researchers enable precise measurement of hallucination and abstention behavior. The accompanying datasets—25K Arabic text-grounded reasoning pairs and 5K bilingual preference samples—provide concrete resources for training aligned models. The agentic RAG approach differs from standard retrieval-augmented generation by using structured tool calls for iterative evidence seeking and answer revision, mimicking the scholarly process of consulting sources.

From an industry perspective, this demonstrates how domain-specific benchmarks and training data can improve LLM performance even on smaller models like Qwen3 4B. The bilingual focus and public dataset release signal growing recognition that non-English applications require dedicated research investment. The framework could serve as a template for other specialized knowledge domains where accuracy and grounding matter significantly.

Looking forward, the adoption of agentic approaches in domain-specific AI applications may accelerate. This work validates that iterative evidence-seeking outperforms single-pass retrieval, suggesting future LLM systems will increasingly employ agent-based architectures for high-stakes applications across religious, medical, and legal domains.

Key Takeaways

→IslamicFaithQA benchmark enables direct measurement of hallucination and abstention in Islamic QA systems, addressing gaps in traditional MCQ/MRC evaluations.
→Agentic RAG framework using iterative tool calls achieves superior performance compared to standard RAG for grounded religious question-answering.
→Publicly released datasets including 25K Arabic-grounded reasoning pairs provide foundational resources for building faithful Islamic AI systems.
→Agentic approaches demonstrate meaningful improvements even on small models (4B parameters), suggesting scalability for resource-constrained deployment.
→Bilingual (Arabic/English) focus and verse-level Qur'an corpus establish methodological precedent for domain-specific multilingual AI applications.

Mentioned in AI

Companies

Hugging Face→