AIBullisharXiv – CS AI · 10h ago7/10
🧠
Large Language Model-Assisted Cleaning of Report-Derived Labels in a Large-Scale Chest CT Dataset
Researchers used GPT-5.4 to identify labeling errors in CT-RATE, a large-scale chest CT dataset containing 24,434 radiology reports and 439,812 label instances. The LLM-assisted cleaning achieved 96.4% agreement with existing labels, with radiologists validating that the model correctly identified discordances in 74-92% of flagged cases, demonstrating potential for scalable dataset quality improvement.
🏢 Microsoft🧠 GPT-5