#object-detection News & Analysis

46 articles tagged with #object-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

46 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

FineGen: A VLM-based Multi-Agent Framework for Fine-Grained Image-Text Dataset Construction

FineGen is a VLM-based multi-agent framework that automatically constructs vision-language datasets by generating hard negative samples through a Generation-Verification-Correction pipeline. The resulting FineGen-100K dataset contains 147,000+ attribute-specific hard negatives and demonstrates a 14.4% accuracy improvement on fine-grained object detection benchmarks, addressing a critical gap in existing datasets.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Digital-to-Physical Transfer of Adversarial Patches for Aerial Vehicle Detection

Researchers demonstrate that adversarial patches—printable patterns designed to fool AI object detectors—can be physically deployed against aerial vehicle detection systems with significant effectiveness. The study reveals that patches placed directly on vehicles outperform digitally-optimized designs in real-world conditions, exposing critical vulnerabilities in deep neural network-based detection systems used for surveillance and monitoring applications.

AIBullisharXiv – CS AI · May 277/10

🧠

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Researchers introduce LocateAnything, a new vision-language model framework that uses Parallel Box Decoding to detect and localize objects simultaneously rather than sequentially, improving both inference speed and accuracy. The team curated a 138-million-sample dataset and demonstrated significant performance improvements across multiple benchmarks.

AIBullisharXiv – CS AI · May 117/10

🧠

XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling

XiYOLO is a new energy-efficient object detection framework that uses neural architecture search and scaling techniques to optimize AI models for edge devices with strict power constraints. The system achieves 20-53% energy reductions compared to YOLOv12 baselines across GPU and NPU deployments while maintaining competitive accuracy metrics.

AIBullisharXiv – CS AI · Apr 137/10

🧠

Neural Distribution Prior for LiDAR Out-of-Distribution Detection

Researchers propose Neural Distribution Prior (NDP), a framework that significantly improves LiDAR-based out-of-distribution detection for autonomous driving by modeling prediction distributions and adaptively reweighting OOD scores. The approach achieves a 10x performance improvement over previous methods on benchmark tests, addressing critical safety challenges in open-world autonomous vehicle perception.

AIBullisharXiv – CS AI · Mar 46/103

🧠

Beyond Prompt Degradation: Prototype-guided Dual-pool Prompting for Incremental Object Detection

Researchers propose PDP, a new framework for Incremental Object Detection that addresses prompt degradation issues in AI models. The method achieves significant improvements of 9.2% AP on MS-COCO and 3.3% AP on PASCAL VOC benchmarks through dual-pool prompt decoupling and prototype-guided pseudo-label generation.

AIBullisharXiv – CS AI · Mar 46/103

🧠

IoUCert: Robustness Verification for Anchor-based Object Detectors

Researchers introduce IoUCert, a new formal verification framework that enables robustness verification for anchor-based object detection models like SSD, YOLOv2, and YOLOv3. The breakthrough uses novel coordinate transformations and Interval Bound Propagation to overcome previous limitations in verifying object detection systems against input perturbations.

AIBullisharXiv – CS AI · Feb 277/107

🧠

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

Researchers introduce SUPERGLASSES, the first comprehensive benchmark for evaluating Vision Language Models in AI smart glasses applications, comprising 2,422 real-world egocentric image-question pairs. They also propose SUPERLENS, a multimodal agent that outperforms GPT-4o by 2.19% through retrieval-augmented answer generation with automatic object detection and web search capabilities.

AINeutralarXiv – CS AI · Jun 236/10

🧠

MIRCaps: A Large-Scale Mixed-Domain Dataset with Image-Level and Region-Level Captions for Fine-Grained Vision-Language Learning

Researchers introduce MIRCaps, a large-scale multimodal dataset containing 141,364 images with 981,947 image-level and 1,742,264 region-level captions designed to improve Vision-Language Models (VLMs) for general imagery and CCTV surveillance applications. The dataset demonstrates effective fine-tuning of lightweight VLMs across image captioning and object detection tasks, with code and data publicly available.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Rethinking Object-Centric Representations for Video Dynamics Modeling

Researchers introduce STAITUS, a machine learning framework that improves unsupervised video object tracking by explicitly separating appearance features from geometric pose information in slot-based representations. The approach addresses a fundamental problem where enforcing temporal consistency causes models to mistrack moving objects and fragment identities, achieving superior performance on tracking stability and segmentation quality.

AINeutralarXiv – CS AI · Jun 235/10

🧠

YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models

Researchers conducted a comprehensive benchmark comparing YOLO26, a new NMS-free object detection model, against YOLOv8 across multiple datasets and hardware configurations. While YOLO26 demonstrated superior accuracy on general object detection tasks, YOLOv8 maintained faster GPU inference speeds, revealing that architectural innovations don't guarantee universal performance advantages.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Automatic Vehicle Detection using DETR: A Transformer-Based Approach for Navigating Treacherous Roads

Researchers have successfully applied Detection Transformer (DETR), a hybrid CNN-Transformer architecture, to vehicle detection in complex driving environments, achieving superior accuracy compared to traditional methods like YOLO. The study introduces Co-DETR with improved training schemes and demonstrates practical advantages for autonomous vehicle navigation across diverse lighting and road conditions.

AINeutralarXiv – CS AI · Jun 236/10

🧠

The Power of Light: Improving Synthetic-to-Real Domain Adaptation through Physically-Based Indirect Illumination

Researchers present SmartSDG, an automated pipeline using physically-based rendering to improve synthetic-to-real domain adaptation for object detection. The study demonstrates that indirect lighting and complex backgrounds significantly reduce the performance gap between synthetic training data and real-world applications, with implications for industrial automation and computer vision systems.

🏢 Nvidia

AINeutralarXiv – CS AI · Jun 196/10

🧠

Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting

Researchers introduce Visual Attentive Prompting (VAP), a training-free method that enables Vision-Language-Action models to perform personalized object manipulation tasks by using reference images to identify specific instances of objects. The approach bridges the gap between semantic understanding and instance-level control, allowing robots to execute commands like 'bring my cup' by distinguishing target objects from visually similar alternatives without requiring model retraining.

AIBullisharXiv – CS AI · Jun 116/10

🧠

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation

Researchers have introduced SDQM (Synthetic Dataset Quality Metric), a novel evaluation framework for assessing the quality of synthetically generated data used in object detection tasks without requiring full model training. The metric demonstrates strong correlation with YOLO11 performance metrics and provides actionable insights for dataset improvement, addressing a critical bottleneck in resource-constrained machine learning development.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals

Researchers have released an open-source AI model for detecting UK mammals and birds from camera trap images, trained on 48,165 labeled instances with 98.4% mean average precision. The democratization effort aims to counter commercial platforms by providing ecologists with accessible tools for biodiversity monitoring, distributed under a non-commercial license.

AINeutralarXiv – CS AI · Jun 95/10

🧠

PolyBuild: An End-to-End Method for Polygonal Building Contour Extraction from High-Resolution Remote Sensing Images

PolyBuild introduces an end-to-end deep learning method for extracting building polygon contours directly from high-resolution remote sensing images without post-processing. The hybrid CNN-Transformer architecture combines an Initial Contour Generation Module with a Contour Optimization Module to achieve superior performance over existing mask-based and contour-based approaches.

$MATIC

AINeutralarXiv – CS AI · Jun 95/10

🧠

Proposal Refinement for Few-Shot Object Detection

Researchers propose a proposal refinement approach for few-shot object detection that addresses the unbalanced distribution of region proposals between novel and base classes. The method introduces a refinement loss during base training and a refinement branch for RPN during fine-tuning, achieving 1-6% performance improvements on benchmarks without additional inference costs.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Attention-Guided Autoencoder Fusion for Insulator Defect Detection Using UAV Transmission-Line Imaging

Researchers developed AE-YOLO, an advanced deep learning framework combining autoencoders with YOLO object detection for identifying defects in high-voltage transmission-line insulators using UAV imagery. The system achieves 95.10% mAP performance, substantially outperforming existing YOLO baselines and offering a scalable solution for critical infrastructure inspection.

AINeutralarXiv – CS AI · Jun 55/10

🧠

HDST-GNN: Heterogeneous Dynamic Spatiotemporal Graph Neural Networks for Multi-Object Tracking in UAV Aerial Imagery

Researchers introduce HDST-GNN, a graph neural network designed to improve multi-object tracking in drone footage by accounting for varying altitudes, object occlusion, and different detection states. The model achieves significant performance gains over existing methods, reducing identity-switching errors by up to 81% on benchmark datasets.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Real-Time Automatic License Plate Recognition Using YOLOv8, SORT Tracking, and Temporal Data Interpolation

Researchers present an automated license plate recognition system combining YOLOv8 object detection, SORT multi-object tracking, and temporal data interpolation to improve real-time video processing in traffic monitoring. The five-stage pipeline addresses challenges like variable lighting, high vehicle speeds, and occlusion that traditionally degrade recognition accuracy and tracking consistency.

AIBullisharXiv – CS AI · Jun 46/10

🧠

HYolo: An Intelligent IoT-Based Object Detection System Using Hypergraph Learning

HYolo introduces a hypergraph learning framework integrated into YOLO object detection architecture to capture high-order feature relationships beyond traditional pairwise interactions. The system demonstrates 12% mAP@50 improvement on COCO datasets, offering enhanced contextual understanding for IoT-based vision applications.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Instance-Level Post Hoc Uncertainty Quantification in Object Detection

Researchers propose MC-GLM, a novel method for quantifying uncertainty in object detection predictions without model retraining, using Laplace approximation and Monte Carlo sampling. The technique enables efficient, instance-level uncertainty estimates critical for autonomous driving safety, validated on the nuScenes dataset with CenterPoint detector.

AINeutralarXiv – CS AI · Jun 26/10

🧠

MoEIoU: Rethinking Bounding-Box Regression as a Mixture of Experts

Researchers introduce MoEIoU, a novel machine learning approach that reformulates bounding-box regression for object detection using a mixture-of-experts framework. The method dynamically balances multiple localization objectives during training, outperforming existing solutions across standard benchmarks and architectures.

AINeutralarXiv – CS AI · Jun 16/10

🧠

PInVerify: An Offline Embodied Benchmark for Active Instance Verification

Researchers introduce PInVerify, an offline benchmark for training embodied AI agents to verify whether objects match fine-grained descriptions through active viewpoint selection. The benchmark includes 3,000 episodes across 18 object categories and evaluates multimodal language models at on-device scale, with best results reaching 85.6% accuracy using fine-tuned approaches.

Page 1 of 2Next →