y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#computer-vision News & Analysis

507 articles tagged with #computer-vision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

507 articles
AIBullisharXiv – CS AI Β· Mar 26/1013
🧠

Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models

Researchers propose a new training method called pseudo contrastive learning to improve diagram comprehension in multimodal AI models like CLIP. The approach uses synthetic diagram samples to help models better understand fine-grained structural differences in diagrams, showing significant improvements in flowchart understanding tasks.

AIBullisharXiv – CS AI Β· Mar 26/1012
🧠

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

Researchers introduce SeaΒ² (See, Act, Adapt), a novel approach that improves AI perception models in new environments by using an intelligent pose-control agent rather than retraining the models themselves. The method keeps perception modules frozen and uses a vision-language model as a controller, achieving significant performance improvements of 13-27% across visual tasks without requiring additional training data.

AIBullisharXiv – CS AI Β· Mar 26/1011
🧠

Evidential Neural Radiance Fields

Researchers introduce Evidential Neural Radiance Fields, a new probabilistic approach that enables uncertainty quantification in 3D scene modeling while maintaining rendering quality. The method addresses critical limitations in existing NeRF technology by capturing both aleatoric and epistemic uncertainty from a single forward pass, making neural radiance fields more suitable for safety-critical applications.

AIBullisharXiv – CS AI Β· Mar 27/1015
🧠

Interpretable Debiasing of Vision-Language Models for Social Fairness

Researchers have developed DeBiasLens, a new framework that uses sparse autoencoders to identify and deactivate social bias neurons in Vision-Language models without degrading their performance. The model-agnostic approach addresses concerns about unintended social bias in VLMs by making the debiasing process interpretable and targeting internal model dynamics rather than surface-level fixes.

AIBullisharXiv – CS AI Β· Mar 27/1012
🧠

Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

Researchers introduce HDFLIM, a new framework that aligns vision and language AI models without requiring computationally expensive fine-tuning by using hyperdimensional computing to create cross-modal mappings while keeping foundation models frozen. The approach achieves comparable performance to traditional training methods while being significantly more resource-efficient.

AIBullisharXiv – CS AI Β· Mar 27/1017
🧠

SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

Researchers introduced SemVideo, a breakthrough AI framework that can reconstruct videos from brain activity using fMRI scans. The system uses hierarchical semantic guidance to overcome previous limitations in visual consistency and temporal coherence, achieving state-of-the-art results in brain-to-video reconstruction.

$RNDR
AIBullisharXiv – CS AI Β· Mar 27/1017
🧠

SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

SceneTok introduces a novel 3D scene tokenizer that compresses view sets into permutation-invariant tokens, achieving 1-3 orders of magnitude better compression than existing methods while maintaining state-of-the-art reconstruction quality. The system enables efficient 3D scene generation in 5 seconds using a lightweight decoder that can render novel viewpoints.

AIBullisharXiv – CS AI Β· Mar 26/1017
🧠

LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

Researchers have developed LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic 3D virtual replicas suitable for AR/VR, gaming, robotics, and digital twins. The system features scene understanding, object retrieval, material painting, and physics integration to create graphics-ready environments that support object individuality and physically-based rendering.

AIBullisharXiv – CS AI Β· Mar 26/1015
🧠

DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer

Researchers introduce DiffusionHarmonizer, an AI framework that enhances neural reconstruction simulations for autonomous robots by converting multi-step image diffusion models into single-step enhancers. The system addresses artifacts in NeRF and 3D Gaussian Splatting methods while improving realism for applications like self-driving vehicle simulation.

AINeutralarXiv – CS AI Β· Mar 26/1012
🧠

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

Researchers introduce DLEBench, the first benchmark specifically designed to evaluate instruction-based image editing models' ability to edit small-scale objects that occupy only 1%-10% of image area. Testing on 10 models revealed significant performance gaps in small object editing, highlighting a critical limitation in current AI image editing capabilities.

AIBullisharXiv – CS AI Β· Mar 27/1012
🧠

MEGS$^{2}$: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning

Researchers introduce MEGSΒ², a new memory-efficient framework for 3D Gaussian Splatting that reduces memory consumption by 50% for static rendering and 40% for real-time rendering. The breakthrough enables 3D rendering on edge devices by replacing memory-intensive spherical harmonics with lightweight spherical Gaussian lobes and implementing unified pruning optimization.

AIBullisharXiv – CS AI Β· Mar 26/1019
🧠

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

Researchers introduced BEV-VLM, a new autonomous driving trajectory planning system that combines Vision-Language Models with Bird's-Eye View maps from camera and LiDAR data. The approach achieved 53.1% better planning accuracy and complete collision avoidance compared to vision-only methods on the nuScenes dataset.

AIBullisharXiv – CS AI Β· Mar 27/1014
🧠

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

Researchers introduce Max-V1, a novel vision-language model framework that treats autonomous driving as a language problem, predicting trajectories from camera input. The model achieved over 30% performance improvement on the nuScenes dataset and demonstrates strong cross-vehicle adaptability.

AIBullisharXiv – CS AI Β· Mar 27/1021
🧠

DeepEyesV2: Toward Agentic Multimodal Model

DeepEyesV2 is a new agentic multimodal AI model that combines text and image comprehension with external tool integration like code execution and web search. The research introduces a two-stage training pipeline and RealX-Bench evaluation framework, demonstrating improved real-world reasoning capabilities through adaptive tool invocation.

AIBullisharXiv – CS AI Β· Mar 26/1011
🧠

Less is More: AMBER-AFNO -- a New Benchmark for Lightweight 3D Medical Image Segmentation

Researchers developed AMBER-AFNO, a new lightweight architecture for 3D medical image segmentation that replaces traditional attention mechanisms with Adaptive Fourier Neural Operators. The model achieves state-of-the-art results on medical datasets while maintaining linear memory scaling and quasi-linear computational complexity.

$NEAR
AINeutralarXiv – CS AI Β· Mar 27/1010
🧠

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Researchers introduce Veritas, a multi-modal large language model designed for deepfake detection that uses pattern-aware reasoning to mimic human forensic processes. The system addresses real-world challenges through the HydraFake dataset and achieves significant improvements in detecting unseen forgeries across different domains.

AIBullisharXiv – CS AI Β· Feb 276/105
🧠

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

BetterScene is a new AI approach that enhances 3D scene synthesis and novel view generation from sparse photos by leveraging Stable Video Diffusion with improved regularization techniques. The method integrates 3D Gaussian Splatting and addresses consistency issues in existing diffusion-based solutions through temporal equivariance and vision foundation model alignment.

$RNDR
AIBullisharXiv – CS AI Β· Feb 276/107
🧠

AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction

Researchers have developed AeroDGS, a physics-guided 4D Gaussian splatting framework that enables accurate dynamic scene reconstruction from single-view aerial UAV footage. The system addresses key challenges in monocular aerial reconstruction by incorporating physics-based optimization and geometric constraints to resolve depth ambiguity and improve motion estimation.

AIBullisharXiv – CS AI Β· Feb 276/108
🧠

Autoregressive Visual Decoding from EEG Signals

Researchers developed AVDE, a lightweight framework for decoding visual information from EEG brain signals using autoregressive generation. The system outperforms existing methods while using only 10% of the parameters, potentially advancing practical brain-computer interface applications.

AIBullisharXiv – CS AI Β· Feb 276/106
🧠

Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

Researchers propose QΒ², a new framework that addresses gradient imbalance issues in quantization-aware training for complex visual tasks like object detection and image segmentation. The method achieves significant performance improvements (+2.5% mAP for object detection, +3.7% mDICE for segmentation) while introducing no inference-time overhead.

$ADA