AIBearisharXiv โ CS AI ยท 8h ago7/10
๐ง
The LLM Bottleneck: Why Open-Source Vision LLMs Struggle with Hierarchical Visual Recognition
Research reveals that open-source large language models (LLMs) lack hierarchical knowledge of visual taxonomies, creating a bottleneck for vision LLMs in hierarchical visual recognition tasks. The study used one million visual question answering tasks across six taxonomies to demonstrate this limitation, finding that even fine-tuning cannot overcome the underlying LLM knowledge gaps.