FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs
FineREX introduces a fine-tuned language model pipeline for extracting structured data from court documents to build knowledge graphs about human smuggling networks. The domain-specific approach achieves 15-31% performance gains over general-purpose models while reducing processing time by half, demonstrating that specialized AI outperforms larger generalist systems in legal document analysis.
FineREX addresses a critical gap in applying artificial intelligence to law enforcement and legal intelligence work. While large language models have demonstrated broad capabilities across industries, this research reveals their limitations when applied to highly specialized domains with unique terminology and relationship structures. The team's fine-tuned approach, trained on just 512 manually annotated legal text chunks, substantially outperforms much larger baseline models—achieving 15.50% and 31.46% F1-score improvements in entity and relationship extraction respectively. This counterintuitive result challenges the prevailing assumption that bigger models always perform better.
The practical implications extend beyond academic interest. Court proceedings represent an underutilized data source for understanding illicit networks, yet manual extraction from jargon-heavy legal documents consumes enormous investigative resources. FineREX cuts processing time in half while simultaneously improving output quality, reducing legal noise by nearly 50% and lowering node duplication from 17.78% to 11.17% on extended documents. These efficiency gains matter for law enforcement agencies operating under resource constraints.
The research validates domain-specific fine-tuning as a scalable strategy for government and institutional applications. Rather than implementing massive general-purpose models, organizations handling specialized text—legal, medical, financial—can achieve superior results with targeted training on smaller, representative datasets. This approach offers cost and efficiency advantages for enterprise implementations where accuracy directly impacts operations. The methodology establishes a replicable framework for applying AI to other regulated domains requiring precise information extraction from complex, technical documentation.
- →Fine-tuned domain-specific LLMs outperform larger general-purpose models by 15-31% on legal document analysis tasks
- →Knowledge graph quality improved significantly with 50% reduction in legal noise and node duplication dropping to 11.17%
- →End-to-end processing time decreased 50% by eliminating redundant document rewriting and extraction stages
- →The approach demonstrates scalability for law enforcement and institutional applications analyzing illicit networks
- →Targeted fine-tuning on 512 annotated examples proves more effective than deploying substantially larger baseline models