AIBullisharXiv โ CS AI ยท 5h ago1
๐ง
iGVLM: Dynamic Instruction-Guided Vision Encoding for Question-Aware Multimodal Understanding
Researchers propose iGVLM, a new framework that addresses limitations in Large Vision-Language Models by introducing dynamic instruction-guided visual encoding. The system uses a dual-branch architecture to enable task-specific visual reasoning while preserving pre-trained visual knowledge.