y0news
#vision-language2 articles
2 articles
AIBearisharXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

CaptionFool: Universal Image Captioning Model Attacks

Researchers have developed CaptionFool, a universal adversarial attack that can manipulate AI image captioning models by modifying just 1.2% of image patches. The attack achieves 94-96% success rates in forcing models to generate arbitrary captions, including offensive content that can bypass content moderation systems.

AIBullisharXiv โ€“ CS AI ยท 6h ago2
๐Ÿง 

Unified Vision-Language Modeling via Concept Space Alignment

Researchers introduce V-SONAR, a vision-language embedding system that extends text-only SONAR to support 1500+ languages with vision capabilities. The system demonstrates state-of-the-art performance on video captioning and multilingual vision tasks through V-LCM, which combines vision and language processing in a unified framework.