AIBullisharXiv – CS AI · 7h ago7/10
🧠
Simple Token-Efficient Vision-Language Model for Case-level Pathology Synoptic Report Generation
Researchers present an efficient vision-language model for generating pathology reports from whole-slide images (WSIs), achieving 64x sequence length reduction through optimized patch sampling while requiring only half an NVIDIA H100 GPU for training. The two-stage approach combines WSI captioning with case-level fine-tuning to handle multi-slide pathology cases, establishing a reproducible baseline for resource-constrained medical AI development.
🏢 Nvidia