507 articles tagged with #computer-vision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv β CS AI Β· Mar 26/109
π§ Researchers propose ProtoDCS, a new framework for robust test-time adaptation of Vision-Language Models in open-set scenarios. The method uses Gaussian Mixture Model verification and uncertainty-aware learning to better handle distribution shifts while maintaining computational efficiency.
AIBullisharXiv β CS AI Β· Mar 26/1013
π§ Researchers propose a new training method called pseudo contrastive learning to improve diagram comprehension in multimodal AI models like CLIP. The approach uses synthetic diagram samples to help models better understand fine-grained structural differences in diagrams, showing significant improvements in flowchart understanding tasks.
AIBullisharXiv β CS AI Β· Mar 26/1012
π§ Researchers introduce SeaΒ² (See, Act, Adapt), a novel approach that improves AI perception models in new environments by using an intelligent pose-control agent rather than retraining the models themselves. The method keeps perception modules frozen and uses a vision-language model as a controller, achieving significant performance improvements of 13-27% across visual tasks without requiring additional training data.
AIBullisharXiv β CS AI Β· Mar 26/1011
π§ Researchers introduce Evidential Neural Radiance Fields, a new probabilistic approach that enables uncertainty quantification in 3D scene modeling while maintaining rendering quality. The method addresses critical limitations in existing NeRF technology by capturing both aleatoric and epistemic uncertainty from a single forward pass, making neural radiance fields more suitable for safety-critical applications.
AIBullisharXiv β CS AI Β· Mar 27/1015
π§ Researchers have developed DeBiasLens, a new framework that uses sparse autoencoders to identify and deactivate social bias neurons in Vision-Language models without degrading their performance. The model-agnostic approach addresses concerns about unintended social bias in VLMs by making the debiasing process interpretable and targeting internal model dynamics rather than surface-level fixes.
AIBullisharXiv β CS AI Β· Mar 27/1012
π§ Researchers introduce HDFLIM, a new framework that aligns vision and language AI models without requiring computationally expensive fine-tuning by using hyperdimensional computing to create cross-modal mappings while keeping foundation models frozen. The approach achieves comparable performance to traditional training methods while being significantly more resource-efficient.
AIBullisharXiv β CS AI Β· Mar 27/1017
π§ Researchers introduced SemVideo, a breakthrough AI framework that can reconstruct videos from brain activity using fMRI scans. The system uses hierarchical semantic guidance to overcome previous limitations in visual consistency and temporal coherence, achieving state-of-the-art results in brain-to-video reconstruction.
$RNDR
AIBullisharXiv β CS AI Β· Mar 27/1017
π§ SceneTok introduces a novel 3D scene tokenizer that compresses view sets into permutation-invariant tokens, achieving 1-3 orders of magnitude better compression than existing methods while maintaining state-of-the-art reconstruction quality. The system enables efficient 3D scene generation in 5 seconds using a lightweight decoder that can render novel viewpoints.
AIBullisharXiv β CS AI Β· Mar 26/1017
π§ Researchers have developed LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic 3D virtual replicas suitable for AR/VR, gaming, robotics, and digital twins. The system features scene understanding, object retrieval, material painting, and physics integration to create graphics-ready environments that support object individuality and physically-based rendering.
AIBullisharXiv β CS AI Β· Mar 26/1015
π§ Researchers introduce DiffusionHarmonizer, an AI framework that enhances neural reconstruction simulations for autonomous robots by converting multi-step image diffusion models into single-step enhancers. The system addresses artifacts in NeRF and 3D Gaussian Splatting methods while improving realism for applications like self-driving vehicle simulation.
AINeutralarXiv β CS AI Β· Mar 26/1012
π§ Researchers introduce DLEBench, the first benchmark specifically designed to evaluate instruction-based image editing models' ability to edit small-scale objects that occupy only 1%-10% of image area. Testing on 10 models revealed significant performance gaps in small object editing, highlighting a critical limitation in current AI image editing capabilities.
AIBullisharXiv β CS AI Β· Mar 27/1012
π§ Researchers introduce MEGSΒ², a new memory-efficient framework for 3D Gaussian Splatting that reduces memory consumption by 50% for static rendering and 40% for real-time rendering. The breakthrough enables 3D rendering on edge devices by replacing memory-intensive spherical harmonics with lightweight spherical Gaussian lobes and implementing unified pruning optimization.
AIBullisharXiv β CS AI Β· Mar 26/1019
π§ Researchers introduced BEV-VLM, a new autonomous driving trajectory planning system that combines Vision-Language Models with Bird's-Eye View maps from camera and LiDAR data. The approach achieved 53.1% better planning accuracy and complete collision avoidance compared to vision-only methods on the nuScenes dataset.
AIBullisharXiv β CS AI Β· Mar 27/1014
π§ Researchers introduce Max-V1, a novel vision-language model framework that treats autonomous driving as a language problem, predicting trajectories from camera input. The model achieved over 30% performance improvement on the nuScenes dataset and demonstrates strong cross-vehicle adaptability.
AIBullisharXiv β CS AI Β· Mar 26/1013
π§ Researchers introduce Draw-In-Mind (DIM), a new approach to multimodal AI models that improves image editing by better balancing responsibilities between understanding and generation modules. The DIM-4.6B model achieves state-of-the-art performance on image editing benchmarks despite having fewer parameters than competing models.
AIBullisharXiv β CS AI Β· Mar 27/1021
π§ DeepEyesV2 is a new agentic multimodal AI model that combines text and image comprehension with external tool integration like code execution and web search. The research introduces a two-stage training pipeline and RealX-Bench evaluation framework, demonstrating improved real-world reasoning capabilities through adaptive tool invocation.
AIBullisharXiv β CS AI Β· Mar 26/1011
π§ Researchers developed AMBER-AFNO, a new lightweight architecture for 3D medical image segmentation that replaces traditional attention mechanisms with Adaptive Fourier Neural Operators. The model achieves state-of-the-art results on medical datasets while maintaining linear memory scaling and quasi-linear computational complexity.
$NEAR
AINeutralarXiv β CS AI Β· Mar 27/1023
π§ Researchers introduce SWITCH, a new benchmark for testing autonomous AI agents' ability to interact with physical interfaces like switches and appliance panels in real-world scenarios. The benchmark reveals significant gaps in current AI models' capabilities for long-horizon tasks requiring causal reasoning and verification.
AINeutralarXiv β CS AI Β· Mar 27/1010
π§ Researchers introduce Veritas, a multi-modal large language model designed for deepfake detection that uses pattern-aware reasoning to mimic human forensic processes. The system addresses real-world challenges through the HydraFake dataset and achieves significant improvements in detecting unseen forgeries across different domains.
AIBullisharXiv β CS AI Β· Mar 27/1015
π§ Researchers propose CycleBEV, a new regularization framework that improves bird's-eye-view semantic segmentation for autonomous driving by using cycle consistency to enhance view transformation networks. The method shows significant improvements up to 4.86 mIoU without increasing inference complexity.
AIBullisharXiv β CS AI Β· Feb 276/105
π§ BetterScene is a new AI approach that enhances 3D scene synthesis and novel view generation from sparse photos by leveraging Stable Video Diffusion with improved regularization techniques. The method integrates 3D Gaussian Splatting and addresses consistency issues in existing diffusion-based solutions through temporal equivariance and vision foundation model alignment.
$RNDR
AIBullisharXiv β CS AI Β· Feb 276/107
π§ Researchers have developed AeroDGS, a physics-guided 4D Gaussian splatting framework that enables accurate dynamic scene reconstruction from single-view aerial UAV footage. The system addresses key challenges in monocular aerial reconstruction by incorporating physics-based optimization and geometric constraints to resolve depth ambiguity and improve motion estimation.
AIBullisharXiv β CS AI Β· Feb 276/108
π§ Researchers developed AVDE, a lightweight framework for decoding visual information from EEG brain signals using autoregressive generation. The system outperforms existing methods while using only 10% of the parameters, potentially advancing practical brain-computer interface applications.
AIBullisharXiv β CS AI Β· Feb 276/107
π§ Researchers developed a deep learning framework using Organ Focused Attention (OFA) to predict renal tumor malignancy from 3D CT scans without requiring manual segmentation. The system achieved AUC scores of 0.685-0.760 across datasets, outperforming traditional segmentation-based approaches while reducing labor and costs.
AIBullisharXiv β CS AI Β· Feb 276/106
π§ Researchers propose QΒ², a new framework that addresses gradient imbalance issues in quantization-aware training for complex visual tasks like object detection and image segmentation. The method achieves significant performance improvements (+2.5% mAP for object detection, +3.7% mDICE for segmentation) while introducing no inference-time overhead.
$ADA