y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#video-llm News & Analysis

4 articles tagged with #video-llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AINeutralarXiv – CS AI Β· 6d ago7/10
🧠

Distorted or Fabricated? A Survey on Hallucination in Video LLMs

Researchers have conducted a comprehensive survey on hallucinations in Video Large Language Models (Vid-LLMs), identifying two core typesβ€”dynamic distortion and content fabricationβ€”and their root causes in temporal representation limitations and insufficient visual grounding. The study reviews evaluation benchmarks, mitigation strategies, and proposes future directions including motion-aware encoders and counterfactual learning to improve reliability.

AINeutralarXiv – CS AI Β· Mar 177/10
🧠

From Evaluation to Defense: Advancing Safety in Video Large Language Models

Researchers introduced VideoSafetyEval, a benchmark revealing that video-based large language models have 34.2% worse safety performance than image-based models. They developed VideoSafety-R1, a dual-stage framework that achieves 71.1% improvement in safety through alarm token-guided fine-tuning and safety-guided reinforcement learning.

AIBearisharXiv – CS AI Β· Mar 37/108
🧠

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Researchers have discovered VidDoS, a new universal attack framework that can severely degrade Video-based Large Language Models by causing extreme computational resource exhaustion. The attack increases token generation by over 205x and inference latency by more than 15x, creating critical safety risks in real-world applications like autonomous driving.

AINeutralarXiv – CS AI Β· Mar 164/10
🧠

Geometry-Guided Camera Motion Understanding in VideoLLMs

Researchers developed a framework to improve video-language models' understanding of camera motion through geometric analysis. The study introduces CameraMotionDataset and CameraMotionVQA benchmark, revealing that current VideoLLMs struggle with camera motion recognition and proposing a lightweight solution using 3D foundation models.