y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources

arXiv – CS AI|Joshua Castillo, Ravi Mukkamala|
🤖AI Summary

Researchers introduce Guardian Parser Pack, an AI-driven system that extracts and normalizes missing-person intelligence from heterogeneous documents using LLM-assisted parsing combined with schema validation. The system achieved 86.64% F1 score on manual evaluation while improving data completeness to 96.97%, demonstrating practical viability of probabilistic AI in high-stakes investigative workflows.

Analysis

Guardian Parser Pack addresses a critical operational challenge in missing-person investigations: transforming fragmented, inconsistently formatted case documents into standardized, searchable intelligence. Law enforcement agencies currently manage dozens of document types—forms, posters, web profiles—each with varying layouts and terminology, creating bottlenecks in case triage and spatial analysis. This research tackles that friction through a layered pipeline combining deterministic and LLM-based extraction pathways.

The technical approach reflects maturing best practices for deploying AI in regulated domains. Rather than treating LLMs as black boxes, the system embeds validator-guided repair and schema-first design, ensuring outputs remain auditable and reversible. The 86% F1 improvement over deterministic methods justifies the computational cost (3.95 vs 0.03 seconds per record) for high-value cases, while deterministic fallback remains available for batch processing.

For law enforcement and child-safety organizations, this represents tangible progress toward AI-augmented investigation at scale. Improved data completeness (96.97%) directly enables faster geographic profiling and pattern matching. The work also models responsible AI deployment: testing against gold-standard aligned data, maintaining human-reviewable audit trails, and accepting computational trade-offs for accuracy gains in life-safety contexts.

Future deployment hinges on real-world validation across diverse case types, cross-jurisdictional data harmonization, and integration with existing case management systems. The schema-first framework appears portable to related domains—human trafficking, elder disappearances—where document heterogeneity similarly impedes rapid analysis.

Key Takeaways
  • LLM-assisted extraction achieved 86.64% F1 score versus 25.78% for deterministic methods on missing-person case documents.
  • System design prioritizes auditability through schema validation and validator-guided repair rather than end-to-end AI automation.
  • Aggregate data completeness improved to 96.97% despite 130x slower processing time per record for LLM pathway.
  • Multi-engine PDF extraction with OCR fallback handles real-world document degradation common in investigative archives.
  • Deterministic pathway remains available for batch processing where speed outweighs extraction quality requirements.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles