y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

arXiv – CS AI|Kaouther Mouheb, Amos Pomp, Antoine Manenti, Romy de Haan, Farog Faghir, Joy Martens, Harro Seelaar, Francesco Mattace-Raso, Meike W. Vernooij, Frank J. Wolters, Stefan Klein, Esther E. Bron|
🤖AI Summary

Researchers evaluated LLaMA 3.1, an open-weight large language model, for extracting structured information from Dutch brain MRI reports. The model achieved high accuracy (80-96%) on visual rating scores and detection tasks, with few-shot prompting further improving performance on numerical variables, demonstrating practical viability for automated medical data extraction in radiology.

Analysis

This study addresses a critical bottleneck in medical research: the manual extraction of structured data from unstructured clinical reports. By validating LLaMA 3.1's capability on 947 Dutch neuroradiology reports, the researchers demonstrate that open-weight LLMs can reliably automate this labor-intensive process without requiring specialized medical fine-tuning. The high performance on visual rating scales (87-96% accuracy) and mention detection (82-93% accuracy) suggests these models understand complex clinical terminology and radiological concepts across languages.

The research emerges as healthcare systems worldwide grapple with digitization and data standardization. Previous studies on LLMs in radiology have primarily focused on English-language reports or proprietary models, making this open-source, multilingual validation particularly valuable for international medical institutions. The finding that English translation yields comparable results to Dutch-native processing broadens accessibility, though challenges remain with precise numerical quantification—a critical limitation for clinical metrics.

For healthcare AI developers and medical institutions, this work validates open-source alternatives to expensive proprietary solutions, reducing deployment costs and increasing transparency. The few-shot prompting improvements (80% to 92% accuracy for microbleed counting) suggest that modest contextual examples substantially boost performance, making implementation pragmatic. However, the persistent weakness in location-specific variables indicates these models require structural improvements or hybrid human-AI approaches for complex spatial reasoning.

Looking forward, institutions implementing automated report processing should monitor refinements in open-weight models and benchmark performance against domain-specific baselines. The study's methodology—rigorous evaluation across multiple report splits with inter-rater reliability assessment—establishes standards for validating clinical LLM applications.

Key Takeaways
  • LLaMA 3.1 achieved 87-96% accuracy on visual rating scales in brain MRI reports, demonstrating strong capability for standard neuroradiology assessments.
  • Few-shot prompting significantly improved numerical variable extraction, boosting microbleed counting accuracy from 80% to 92%.
  • Open-weight models show comparable performance in Dutch and English-translated reports, enabling broader international deployment.
  • Challenges persist with location-specific variables and precise numerical quantification, requiring additional refinement or human oversight.
  • This validation supports cost-effective automation of clinical data extraction in healthcare systems using openly available models.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles