AIBullisharXiv – CS AI · 15h ago6/10
🧠
FAST-GOAL: Fast and Efficient Global-local Object Alignment Learning
Researchers introduce FAST-GOAL, a fine-tuning method that improves CLIP's ability to process lengthy text descriptions through global-local semantic alignment. The approach combines object detection with token-level similarity learning and introduces GLIT100k, a new dataset linking long captions to localized image-text pairs, demonstrating significant performance gains across multiple benchmarks.