AIBullisharXiv – CS AI · 18h ago7/10
🧠
FineGen: A VLM-based Multi-Agent Framework for Fine-Grained Image-Text Dataset Construction
FineGen is a VLM-based multi-agent framework that automatically constructs vision-language datasets by generating hard negative samples through a Generation-Verification-Correction pipeline. The resulting FineGen-100K dataset contains 147,000+ attribute-specific hard negatives and demonstrates a 14.4% accuracy improvement on fine-grained object detection benchmarks, addressing a critical gap in existing datasets.