y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#video-language-models News & Analysis

3 articles tagged with #video-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

MACD: Model-Aware Contrastive Decoding via Counterfactual Data

Researchers introduce MACD, a new inference strategy that reduces hallucinations in video language models by using the model's own feedback to identify problematic visual regions and generate targeted counterfactual data. The method combines model-aware object-level modifications with contrastive decoding, showing consistent improvements across multiple benchmarks and video-LLM architectures.

AIBullisharXiv – CS AI · May 116/10
🧠

TTF: Temporal Token Fusion for Efficient Video-Language Model

Researchers introduce Temporal Token Fusion (TTF), a training-free compression technique that reduces visual tokens in video-language models by 67% while maintaining 99.5% accuracy. The method addresses the critical bottleneck of LLM prefill costs in video understanding by identifying and fusing redundant tokens across video frames using local similarity matching.

AIBullisharXiv – CS AI · Mar 37/107
🧠

QuickGrasp: Responsive Video-Language Querying Service via Accelerated Tokenization and Edge-Augmented Inference

Researchers propose QuickGrasp, a video-language querying system that combines local processing with edge computing to achieve both fast response times and high accuracy. The system achieves up to 12.8x reduction in response delay while maintaining the accuracy of large video-language models through accelerated tokenization and adaptive edge augmentation.