#computer-vision News & Analysis

511 articles tagged with #computer-vision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

511 articles

AINeutralarXiv – CS AI · Mar 34/103

🧠

Deformation-Free Cross-Domain Image Registration via Position-Encoded Temporal Attention

Researchers developed GPEReg-Net, a new AI method for cross-domain image registration that eliminates the need for explicit deformation field estimation by decomposing images into domain-invariant scene representations and appearance statistics. The system achieves state-of-the-art performance on benchmarks while running 1.87x faster than existing methods, using position-encoded temporal attention for sequential image processing.

AIBullisharXiv – CS AI · Mar 35/105

🧠

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Researchers introduce ADE-CoT (Adaptive Edit-CoT), a new test-time scaling framework that improves image editing efficiency by 2x while maintaining superior performance. The system uses dynamic resource allocation, edit-specific verification, and opportunistic stopping to optimize the image editing process compared to traditional methods.

AINeutralarXiv – CS AI · Mar 25/105

🧠

Modelling and Simulation of Neuromorphic Datasets for Anomaly Detection in Computer Vision

Researchers introduce ANTShapes, a Unity-based simulation framework that generates synthetic neuromorphic vision datasets to address the scarcity of Dynamic Vision Sensor data. The tool creates configurable 3D scenes with randomly-behaving objects for training anomaly detection and object recognition systems in event-based computer vision.

AINeutralarXiv – CS AI · Feb 274/108

🧠

Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

Researchers developed new unsupervised denoising methods for diffusion magnetic resonance imaging that correct for Rician noise bias and variance issues. The techniques use bias-corrected training objectives within a Deep Image Prior framework to improve image quality in low signal-to-noise ratio conditions without requiring clean reference data.

AINeutralarXiv – CS AI · Feb 274/105

🧠

CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

Researchers introduce CGSA, a new framework for source-free domain adaptive object detection that integrates Object-Centric Learning into DETR-based detectors. The approach uses Hierarchical Slot Awareness and Class-Guided Slot Contrast modules to improve cross-domain object detection without retaining source data, demonstrating superior performance on multiple datasets.

AINeutralarXiv – CS AI · Feb 274/104

🧠

Instruction-based Image Editing with Planning, Reasoning, and Generation

Researchers propose a new multi-modality approach for instruction-based image editing that combines Chain-of-Thought planning, region reasoning, and generation capabilities. The method uses large language models and diffusion models to improve complex image editing tasks compared to existing single-modality approaches.

AIBullisharXiv – CS AI · Feb 274/106

🧠

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Researchers introduce Alignment-Aware Masked Learning (AML), a new training strategy for Referring Image Segmentation that improves pixel-level vision-language alignment. The approach achieves state-of-the-art performance on RefCOCO datasets by filtering poorly aligned regions and focusing on reliable visual-language cues.

AIBullisharXiv – CS AI · Feb 274/107

🧠

SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

Researchers introduce SeeThrough3D, a new AI model that improves 3D layout-conditioned image generation by explicitly modeling object occlusions. The model uses an occlusion-aware 3D scene representation with translucent boxes to better understand depth relationships and generate more realistic partially occluded objects in synthetic scenes.

AINeutralarXiv – CS AI · Feb 274/107

🧠

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

Researchers developed a semi-supervised machine learning pipeline using vision transformers and k-Nearest Neighbor classifiers to automatically detect poor-quality exposures in astronomical imaging surveys. The method was successfully applied to the DECam Legacy Survey, identifying 780 problematic exposures that were verified through visual inspection.

AINeutralarXiv – CS AI · Feb 274/104

🧠

PCReg-Net: Progressive Contrast-Guided Registration for Cross-Domain Image Alignment

Researchers have developed PCReg-Net, a lightweight AI framework for cross-domain image registration that achieves real-time performance at 141 FPS with only 2.56M parameters. The system uses a progressive contrast-guided approach with four modules to align images across different domains, showing improvements over traditional and deep learning baselines on retinal and microscopy benchmarks.

AIBullisharXiv – CS AI · Feb 274/105

🧠

DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces

Researchers introduced DICArt, a new AI framework for articulated object pose estimation that uses discrete diffusion processes instead of continuous space regression. The method incorporates kinematic constraints and hierarchical structure modeling to improve accuracy in estimating 6D poses of complex objects in embodied AI applications.

AIBullishApple Machine Learning · Feb 255/103

🧠

A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

Researchers developed A.R.I.S., an automated e-waste recycling system using deep learning to classify metals, plastics, and circuit boards in real time. The system achieved 90% precision and 84% sortation efficiency, offering a low-cost solution to improve material recovery in electronic waste processing.

AIBullishMIT News – AI · Feb 44/107

🧠

Antonio Torralba, three MIT alumni named 2025 ACM fellows

Antonio Torralba and three MIT alumni have been named 2025 ACM Fellows, recognizing their contributions to computer science. Torralba's research specializes in computer vision, machine learning, and human visual perception.

AINeutralIEEE Spectrum – AI · Jan 124/107

🧠

Machine-Learning System Monitors Patient Pain During Surgery

Researchers developed a contactless machine-learning system that monitors patient pain during surgery by analyzing facial expressions and heart rate data via remote photoplethysmogram (rPPG). The system achieved 45% accuracy when tested on realistic surgical footage, offering a non-invasive alternative to traditional pain monitoring methods that require wired sensors.

AINeutralGoogle DeepMind Blog · Nov 114/106

🧠

Teaching AI to see the world more like we do

A new research paper examines how AI systems perceive and organize visual information differently from humans. The study analyzes the fundamental differences in visual processing between artificial intelligence and human cognition.

AIBullishGoogle Research Blog · Oct 14/105

🧠

Introducing interactive on-device segmentation in Snapseed

Google's Snapseed photo editing app introduces interactive on-device segmentation technology, allowing users to select and edit specific objects in photos directly on their device. This represents an advancement in mobile AI-powered image processing capabilities without requiring cloud connectivity.

AIBullishHugging Face Blog · May 215/108

🧠

nanoVLM: The simplest repository to train your VLM in pure PyTorch

nanoVLM is introduced as a simplified repository for training Vision Language Models (VLMs) using pure PyTorch. The project aims to make VLM training more accessible by providing a streamlined approach without complex dependencies.

AINeutralHugging Face Blog · Apr 224/103

🧠

Finetuning olmOCR to be a faithful OCR-Engine

The article discusses the finetuning process of olmOCR, an optical character recognition engine, to improve its accuracy and reliability. This represents an advancement in AI-powered text recognition technology that could have applications across various digital platforms.

AIBullishHugging Face Blog · Jan 244/103

🧠

We now support VLMs in smolagents!

The article title indicates that smolagents now supports Vision Language Models (VLMs), representing a technical advancement in AI agent capabilities. However, the article body appears to be empty, limiting detailed analysis of the implementation or implications.

AINeutralHugging Face Blog · Jan 164/104

🧠

Timm ❤️ Transformers: Use any timm model with transformers

The article appears to be about integrating timm (PyTorch Image Models) with Hugging Face Transformers library, allowing users to utilize any timm model within the transformers ecosystem. This represents a technical development in AI model interoperability and tooling.

AINeutralHugging Face Blog · Jul 104/107

🧠

Preference Optimization for Vision Language Models

The article title indicates a focus on preference optimization techniques for Vision Language Models, which are AI systems that process both visual and textual information. This represents ongoing research in improving how these multimodal AI models align with human preferences and perform tasks.

AINeutralHugging Face Blog · Jun 245/105

🧠

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

The article discusses fine-tuning Florence-2, Microsoft's advanced vision language model that combines computer vision and natural language processing capabilities. However, the article body appears to be empty or incomplete, limiting detailed analysis of the technical implementation or market implications.

AINeutralHugging Face Blog · Mar 254/108

🧠

Pollen-Vision: Unified interface for Zero-Shot vision models in robotics

The article title references Pollen-Vision, which appears to be a unified interface for zero-shot vision models in robotics applications. However, no article body content was provided for analysis.

AIBullishHugging Face Blog · Mar 155/106

🧠

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

The WebSight Dataset represents a new AI development that enables automatic conversion of web screenshots into HTML code. This breakthrough could significantly streamline web development processes by using machine learning to interpret visual web layouts and generate corresponding code.

AINeutralHugging Face Blog · Mar 55/107

🧠

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

ConTextual is a new benchmark or evaluation framework designed to test multimodal AI models' ability to jointly reason over both text and images in text-rich visual environments. This appears to be a research initiative focused on advancing AI capabilities in understanding complex visual-textual content.

← PrevPage 19 of 21Next →