y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#video-analysis News & Analysis

14 articles tagged with #video-analysis. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

14 articles
AINeutralarXiv – CS AI · Mar 177/10
🧠

Human-AI Ensembles Improve Deepfake Detection in Low-to-Medium Quality Videos

Research comparing 200 humans and 95 AI detectors found humans significantly outperform AI at detecting deepfakes, especially in low-quality mobile phone videos where AI accuracy drops to near chance levels. The study reveals human-AI hybrid systems are most effective, as humans and AI make complementary errors in deepfake detection.

AINeutralarXiv – CS AI · Jun 116/10
🧠

Frozen Multimodal Embeddings for Personality and Cognitive Ability Assessment in Asynchronous Video Interviews

Researchers developed a multimodal machine learning approach using frozen pretrained encoders (CLIP, Whisper, RoBERTa) to predict personality traits and cognitive ability from asynchronous video interviews, achieving 19.1% improvement over baseline on personality assessment but revealing potential dataset shortcuts in cognitive ability evaluation.

AINeutralarXiv – CS AI · Jun 116/10
🧠

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

RelayFormer is a new deep learning framework that unifies image and video manipulation detection through a flexible attention mechanism called Global Local Relay (GLR) tokens. The approach handles variable resolutions without distortion and processes both static and temporal data with a single architecture, addressing key limitations in current visual forensics methods.

AINeutralarXiv – CS AI · Jun 96/10
🧠

Hybrid Robustness Verification for Spatio-Temporal Neural Networks

Researchers introduce Spatio-Temporal Bound Propagation (STBP), a verification framework for neural networks processing video and volumetric data that provides formal robustness guarantees under realistic adversarial constraints. The method achieves 1.7x higher certified robust accuracy compared to existing approaches while maintaining computational scalability, addressing a critical gap in AI safety for applications like autonomous driving and medical imaging.

AINeutralarXiv – CS AI · Jun 86/10
🧠

MotionEnhancer: Leveraging Video Diffusion for Motion-Enhanced Vision-Language Models

Researchers introduce MotionEnhancer, a novel technique that combines Video Diffusion Models with Vision-Language Models to improve fine-grained motion understanding in video analysis. The parameter-free approach uses attention alignment to extract motion priors without requiring additional training or architectural modifications, achieving consistent improvements on motion-understanding benchmarks.

AINeutralarXiv – CS AI · Jun 56/10
🧠

UNIVID: Unified Vision-Language Model for Video Moderation

Researchers introduce UNIVID, a unified vision-language model designed for large-scale video moderation that generates interpretable policy-aware captions instead of opaque classification outputs. The system reduces violation detection errors by 42.7% and false positives by 37.0% while consolidating over 1,000 specialized models into a single backbone, demonstrating practical AI efficiency gains in content moderation infrastructure.

AINeutralarXiv – CS AI · Jun 25/10
🧠

Understanding-Enhanced Model Collaboration for Long-Tailed Egocentric Mistake Detection

Researchers introduce UE-MCM, a dual-model AI system that combines small and large models to detect mistakes in egocentric instructional videos, particularly excelling at identifying rare errors through adaptive fusion and long-tailed distribution handling. The approach balances computational efficiency with accuracy for practical deployment in video analysis tasks.

AINeutralarXiv – CS AI · May 285/10
🧠

Mining Multi-Modality Spatio-Temporal Cues for Video Important Person Identification

Researchers introduce the Video Important Person (VIP) identification task and Temporal-VIP dataset to automatically identify key individuals in video scenes while addressing the Temporal Importance Shift phenomenon. The VIP-Net framework achieves 67.3% accuracy, significantly outperforming existing methods (37.5%-53.9%), with applications in automated video editing and intelligent surveillance.

🏢 Hugging Face
AINeutralarXiv – CS AI · May 276/10
🧠

Suicide Risk Assessment from AI-powered Video Surveillance: An Interpretable Framework for Prevention in Metro Stations

Researchers have developed an interpretable AI framework for assessing suicide risk in metro stations using surveillance video analysis, achieving 83.2% ROC-AUC by combining person tracking, activity recognition, and trajectory analysis. This work addresses a critical public health challenge by enabling early identification of high-risk situations that could facilitate timely intervention.

AIBullisharXiv – CS AI · Mar 266/10
🧠

LensWalk: Agentic Video Understanding by Planning How You See in Videos

Researchers introduced LensWalk, an agentic AI framework that enables Large Language Models to actively control their visual observation of videos through dynamic temporal sampling. The system uses a reason-plan-observe loop to progressively gather evidence, achieving 5% accuracy improvements on challenging video benchmarks without requiring model fine-tuning.

AINeutralarXiv – CS AI · Mar 176/10
🧠

Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video

Research reveals that humans can detect credibility issues in deepfake videos through visual and audio distortions. Three experiments show that both technical artifacts and distortions in synthetic media reduce perceived credibility, though understanding of human perception of deepfakes remains limited.

AIBullisharXiv – CS AI · Mar 175/10
🧠

Learning Question-Aware Keyframe Selection with Synthetic Supervision for Video Question Answering

Researchers developed a question-aware keyframe selection framework for video question answering that uses large multimodal models to generate pseudo labels and coverage regularization. The method significantly improves accuracy on temporal and causal questions in the NExT-QA dataset, making video analysis more efficient by reducing inference costs.

AIBullisharXiv – CS AI · Mar 175/10
🧠

Speech Recognition on TV Series with Video-guided Post-ASR Correction

Researchers have developed a Video-Guided Post-ASR Correction (VPC) framework that uses Video-Large Multimodal Models to improve speech recognition accuracy in complex environments like TV series. The system addresses challenges with multiple speakers, overlapping speech, and domain-specific terminology by leveraging video context to refine ASR outputs.

AINeutralarXiv – CS AI · Mar 164/10
🧠

Team LEYA in 10th ABAW Competition: Multimodal Ambivalence/Hesitancy Recognition Approach

Team LEYA developed a multimodal AI approach for recognizing ambivalence and hesitancy in videos for the 10th ABAW Competition, combining scene, facial, audio, and text analysis. Their fusion model achieved 83.25% accuracy compared to 70.02% for single-modality approaches, demonstrating significant improvements in behavioral recognition technology.