y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#object-detection News & Analysis

28 articles tagged with #object-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

28 articles
AIBullisharXiv – CS AI · 5d ago7/10
🧠

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Researchers introduce LocateAnything, a new vision-language model framework that uses Parallel Box Decoding to detect and localize objects simultaneously rather than sequentially, improving both inference speed and accuracy. The team curated a 138-million-sample dataset and demonstrated significant performance improvements across multiple benchmarks.

AIBullisharXiv – CS AI · May 117/10
🧠

XiYOLO: Energy-Aware Object Detection via Iterative Architecture Search and Scaling

XiYOLO is a new energy-efficient object detection framework that uses neural architecture search and scaling techniques to optimize AI models for edge devices with strict power constraints. The system achieves 20-53% energy reductions compared to YOLOv12 baselines across GPU and NPU deployments while maintaining competitive accuracy metrics.

AIBullisharXiv – CS AI · Apr 137/10
🧠

Neural Distribution Prior for LiDAR Out-of-Distribution Detection

Researchers propose Neural Distribution Prior (NDP), a framework that significantly improves LiDAR-based out-of-distribution detection for autonomous driving by modeling prediction distributions and adaptively reweighting OOD scores. The approach achieves a 10x performance improvement over previous methods on benchmark tests, addressing critical safety challenges in open-world autonomous vehicle perception.

AIBullisharXiv – CS AI · Mar 46/103
🧠

IoUCert: Robustness Verification for Anchor-based Object Detectors

Researchers introduce IoUCert, a new formal verification framework that enables robustness verification for anchor-based object detection models like SSD, YOLOv2, and YOLOv3. The breakthrough uses novel coordinate transformations and Interval Bound Propagation to overcome previous limitations in verifying object detection systems against input perturbations.

AIBullisharXiv – CS AI · Feb 277/107
🧠

SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

Researchers introduce SUPERGLASSES, the first comprehensive benchmark for evaluating Vision Language Models in AI smart glasses applications, comprising 2,422 real-world egocentric image-question pairs. They also propose SUPERLENS, a multimodal agent that outperforms GPT-4o by 2.19% through retrieval-augmented answer generation with automatic object detection and web search capabilities.

AINeutralarXiv – CS AI · 14h ago6/10
🧠

PInVerify: An Offline Embodied Benchmark for Active Instance Verification

Researchers introduce PInVerify, an offline benchmark for training embodied AI agents to verify whether objects match fine-grained descriptions through active viewpoint selection. The benchmark includes 3,000 episodes across 18 object categories and evaluates multimodal language models at on-device scale, with best results reaching 85.6% accuracy using fine-tuned approaches.

AINeutralarXiv – CS AI · 14h ago6/10
🧠

CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects

Researchers introduce CaptionFormer, an end-to-end model that simultaneously detects, segments, tracks, and captions objects in video sequences. The work addresses Dense Video Object Captioning by generating synthetic training data using vision-language models and extends existing datasets, achieving state-of-the-art results across multiple benchmarks.

AIBullisharXiv – CS AI · 3d ago6/10
🧠

GiPL: Generative augmented iterative Pseudo-Labeling for Cross-Domain Few-Shot Object Detection

Researchers propose GiPL, a two-branch machine learning framework that combines iterative pseudo-labeling with generative data augmentation to improve cross-domain few-shot object detection using vision-language models. The method demonstrates significant performance improvements on three benchmark datasets, addressing critical challenges in fine-tuning with limited target-domain samples.

AIBullisharXiv – CS AI · 5d ago6/10
🧠

FAST-GOAL: Fast and Efficient Global-local Object Alignment Learning

Researchers introduce FAST-GOAL, a fine-tuning method that improves CLIP's ability to process lengthy text descriptions through global-local semantic alignment. The approach combines object detection with token-level similarity learning and introduces GLIT100k, a new dataset linking long captions to localized image-text pairs, demonstrating significant performance gains across multiple benchmarks.

AINeutralarXiv – CS AI · 5d ago6/10
🧠

Semantic Robustness Probing via Inpainting: An Interactive Tool for Safety-Critical Object Detection

SemProbe is a new interactive tool for testing object detection systems in safety-critical applications using semantically meaningful image corruptions rather than simple pixel-level noise. The system uses diffusion-based inpainting to generate realistic test scenarios, automatically runs model inference, and logs results as structured artifacts for safety evaluation compliance.

AINeutralarXiv – CS AI · May 126/10
🧠

Benchmarking ResNet Backbones in RT-DETR: Impact of Depth and Regularization under environmental conditions

This research benchmarks RT-DETR object detection models with different ResNet backbones for competitive robotics applications, evaluating how environmental variations like lighting and background contrast affect detection performance. The study finds that intermediate-depth models (ResNet34 and ResNet50) offer optimal balance between accuracy, confidence, and latency, with ResNet50 excelling under illumination changes and ResNet34 performing best under background variations.

AINeutralarXiv – CS AI · May 126/10
🧠

LAGO: Language-Guided Adaptive Object-Region Focus for Zero-Shot Visual-Text Alignment

Researchers introduce LAGO, a framework for zero-shot visual-text alignment that improves classification accuracy by intelligently focusing on relevant image regions rather than analyzing entire images. The method reduces computational cost while avoiding error-amplification feedback loops that plague existing localized alignment approaches.

AINeutralarXiv – CS AI · May 126/10
🧠

RigidFormer: Learning Rigid Dynamics using Transformers

RigidFormer is a Transformer-based neural network that learns rigid-body dynamics simulation from mesh-free point cloud inputs, addressing computational bottlenecks in existing mesh-dependent methods. The model uses object-level reasoning with anchor-based attention mechanisms and enforces physical rigidity constraints through differentiable Kabsch alignment, demonstrating superior performance and generalization across benchmarks.

AINeutralarXiv – CS AI · May 126/10
🧠

CrossVL: Complexity-Aware Feature Routing and Paired Curriculum for Cross-View Vision-Language Detection

CrossVL introduces a novel framework combining Complexity-Aware Pathway Aggregation and Paired Curriculum Learning to improve vision-language model performance in cross-view object detection scenarios. The approach addresses fundamental challenges when models operate across different viewpoints (ground and aerial), achieving measurable improvements in detection accuracy and consistency on the MAVREC dataset.

AIBullisharXiv – CS AI · May 116/10
🧠

RELO: Reinforcement Learning to Localize for Visual Object Tracking

Researchers introduce RELO, a reinforcement learning method for visual object tracking that replaces traditional handcrafted spatial priors with a learned localization policy optimized directly for tracking metrics like IoU and AUC. The approach achieves state-of-the-art results on LaSOText benchmarks, demonstrating that reward-driven localization outperforms conventional prior-based methods.

AINeutralarXiv – CS AI · May 116/10
🧠

Divide and Conquer: Object Co-occurrence Helps Mitigate Simplicity Bias in OOD Detection

Researchers propose OCO (Object Co-occurrence), a new out-of-distribution detection framework that leverages object co-occurrence patterns within images to improve the reliability of deep learning models. The method addresses simplicity bias by learning disentangled representations and using divide-and-conquer logic to distinguish near-OOD samples, achieving competitive results across multiple OOD detection benchmarks.

AIBullisharXiv – CS AI · Apr 106/10
🧠

Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework

Researchers propose a Self-Validation Framework to address object hallucination in Large Vision Language Models (LVLMs), where models generate descriptions of non-existent objects in images. The training-free approach validates object existence through language-prior-free verification and achieves 65.6% improvement on benchmark metrics, suggesting a novel path to enhance LVLM reliability without additional training.

AINeutralarXiv – CS AI · Mar 176/10
🧠

EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos

EgoGrasp introduces the first method to reconstruct world-space hand-object interactions from egocentric videos using open-vocabulary objects. The multi-stage framework combines vision foundation models with body-guided diffusion models to achieve state-of-the-art performance in 3D scene reconstruction and hand pose estimation.

AIBullisharXiv – CS AI · Mar 36/106
🧠

DINOv3 Meets YOLO26 for Weed Detection in Vegetable Crops

Researchers developed a foundational crop-weed detection model combining DINOv3 vision transformer with YOLO26 architecture, achieving significant improvements in precision agriculture applications. The model showed up to 14% better performance on cross-domain datasets while maintaining real-time processing at 28.5 fps despite increased computational requirements.

AIBullisharXiv – CS AI · Mar 36/107
🧠

YCDa: YCbCr Decoupled Attention for Real-time Realistic Camouflaged Object Detection

Researchers propose YCDa, a new AI strategy for real-time camouflaged object detection that mimics human vision by separating color and brightness information. The method achieves 112% improvement in detection accuracy and can be easily integrated into existing AI detection systems with minimal computational overhead.

AIBullisharXiv – CS AI · Feb 276/105
🧠

From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

Researchers have developed a framework that enables open vocabulary object detection models to operate in real-world settings by identifying and learning previously unseen objects. The method introduces techniques called Open World Embedding Learning (OWEL) and Multi-Scale Contrastive Anchor Learning (MSCAL) to detect unknown objects and reduce misclassification errors.

$NEAR
AIBullisharXiv – CS AI · Feb 276/106
🧠

Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

Researchers propose Q², a new framework that addresses gradient imbalance issues in quantization-aware training for complex visual tasks like object detection and image segmentation. The method achieves significant performance improvements (+2.5% mAP for object detection, +3.7% mDICE for segmentation) while introducing no inference-time overhead.

$ADA
AINeutralarXiv – CS AI · Mar 174/10
🧠

Eyes on Target: Gaze-Aware Object Detection in Egocentric Video

Researchers developed 'Eyes on Target', a gaze-aware object detection framework that integrates human eye tracking with Vision Transformers to improve object detection in egocentric videos. The system biases spatial feature selection toward human-attended regions, demonstrating consistent accuracy improvements over traditional methods on multiple datasets including Ego4D.

Page 1 of 2Next →