AINeutralarXiv – CS AI · 9h ago6/10
🧠
Measuring the sensitivity of LLM-based structured extraction to prompt, model, and schema choices in clinical discharge summaries
Researchers evaluated how large language models performing structured data extraction from clinical notes respond to variations in prompts, model sizes, and data schemas. The study found that schema design—particularly the distinction between absent versus undocumented information—drives disagreement more than prompt phrasing, while model choice significantly impacts multi-class categorization tasks.