AIBullisharXiv – CS AI · 9h ago7/10
🧠
DRIFT: A Residual Flow Adapter for Decoding Continuous Outputs in Vision-Language Models
Researchers introduce DRIFT, a framework that adapts pretrained vision-language models to handle continuous numerical outputs rather than discrete tokens. By combining a base predictor with a flow-matching refinement module, DRIFT improves performance on tasks like temporal localization and robotic control across multiple model architectures.