Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
Researchers have fine-tuned Florence-2, a vision-language model, to extract structured fashion attributes from clothing images with 94.6% category accuracy. The resulting model, Fashion Florence, outperforms GPT-4o-mini and Gemini 2.5 Flash on fashion-specific tasks while running efficiently at 0.77B parameters, demonstrating specialized AI models can exceed general-purpose alternatives in narrow domains.