AINeutralarXiv โ CS AI ยท 4h ago6/10
๐ง
MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models
Researchers introduce MODIX, a training-free framework that dynamically optimizes how Vision-Language Models allocate attention across multimodal inputs by adjusting positional encoding based on information density rather than uniform token assignment. The approach improves reasoning performance without modifying model parameters, suggesting positional encoding should be treated as an adaptive resource in multimodal transformer architectures.