AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models
Researchers conducted a systematic study comparing Vision-Language Models built with LLAMA-1, LLAMA-2, and LLAMA-3 backbones, finding that newer LLM architectures don't universally improve VLM performance and instead show task-dependent benefits. The findings reveal that performance gains vary significantly: visual question-answering tasks benefit from improved reasoning in newer models, while vision-heavy tasks see minimal gains from upgraded language backbones.