Retrieval Augmented Generation Framework for the Nepali Legal Domain Question Answering
Researchers have successfully developed the first Retrieval Augmented Generation (RAG) system for legal question answering in Nepali, addressing a critical gap in AI applications for low-resource languages. The system achieved 91% precision using BM25 retrieval and demonstrated 84% human-evaluated truthfulness, establishing a viable foundation for AI-assisted legal services in non-English speaking jurisdictions.
This research represents a meaningful advancement in democratizing AI capabilities across linguistic boundaries. While high-resource languages like English have benefited from sophisticated legal AI systems for years, Nepali and similar low-resource languages have remained underserved due to limited training data and computational resources. This study directly addresses that imbalance by leveraging Retrieval Augmented Generation, which retrieves relevant documents before generating answers rather than relying solely on pre-trained model parameters.
The technical achievement is noteworthy because RAG approaches are particularly well-suited for legal applications where accuracy and verifiability are paramount. By grounding responses in actual case law from the Nepal Kanun Patrika digital archive, the system provides traceable reasoning rather than generating potentially hallucinated legal interpretations. The 84% human-evaluated truthfulness rate demonstrates practical viability, though this still represents a confidence threshold that legal professionals would need to carefully monitor.
For developing economies, this work signals that sophisticated AI infrastructure doesn't require massive proprietary datasets or enormous computational budgets. The BM25 retrieval method, a decades-old ranking algorithm, competing effectively with modern multilingual embeddings suggests that pragmatic, cost-effective solutions can deliver substantial value. This has implications for how other low-resource language communities might approach AI adoption in specialized domains.
Looking forward, the critical next steps involve testing the system's performance on edge cases, integrating it with existing legal workflows, and addressing potential biases in historical case law. Scaling this approach to other low-resource languages and domains could establish templates for inclusive AI development globally.
- βFirst RAG-based legal QA system for Nepali achieves 91% retrieval precision and 84% human-verified answer accuracy
- βBM25 document retrieval outperformed modern multilingual embeddings, suggesting cost-effective AI solutions work for low-resource languages
- βSystem generates 92% successful answers with strong groundedness metrics, demonstrating practical viability for legal professionals
- βRAG approach proves effective for specialized domains where accuracy and source attribution are essential requirements
- βFramework provides replicable methodology for deploying AI legal systems in other underserved linguistic and jurisdictional contexts