AIBullisharXiv โ CS AI ยท 7h ago13
๐ง Researchers introduced BEV-VLM, a new autonomous driving trajectory planning system that combines Vision-Language Models with Bird's-Eye View maps from camera and LiDAR data. The approach achieved 53.1% better planning accuracy and complete collision avoidance compared to vision-only methods on the nuScenes dataset.
AIBullisharXiv โ CS AI ยท 7h ago7
๐ง Researchers introduce Max-V1, a novel vision-language model framework that treats autonomous driving as a language problem, predicting trajectories from camera input. The model achieved over 30% performance improvement on the nuScenes dataset and demonstrates strong cross-vehicle adaptability.
AIBullisharXiv โ CS AI ยท 7h ago14
๐ง DeepEyesV2 is a new agentic multimodal AI model that combines text and image comprehension with external tool integration like code execution and web search. The research introduces a two-stage training pipeline and RealX-Bench evaluation framework, demonstrating improved real-world reasoning capabilities through adaptive tool invocation.
AINeutralarXiv โ CS AI ยท 7h ago8
๐ง Researchers introduce SWITCH, a new benchmark for testing autonomous AI agents' ability to interact with physical interfaces like switches and appliance panels in real-world scenarios. The benchmark reveals significant gaps in current AI models' capabilities for long-horizon tasks requiring causal reasoning and verification.
AIBullisharXiv โ CS AI ยท 7h ago7
๐ง SceneTok introduces a novel 3D scene tokenizer that compresses view sets into permutation-invariant tokens, achieving 1-3 orders of magnitude better compression than existing methods while maintaining state-of-the-art reconstruction quality. The system enables efficient 3D scene generation in 5 seconds using a lightweight decoder that can render novel viewpoints.
AIBullisharXiv โ CS AI ยท 7h ago6
๐ง Researchers introduced SemVideo, a breakthrough AI framework that can reconstruct videos from brain activity using fMRI scans. The system uses hierarchical semantic guidance to overcome previous limitations in visual consistency and temporal coherence, achieving state-of-the-art results in brain-to-video reconstruction.
$RNDR
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers introduce ANTShapes, a Unity-based simulation framework that generates synthetic neuromorphic vision datasets to address the scarcity of Dynamic Vision Sensor data. The tool creates configurable 3D scenes with randomly-behaving objects for training anomaly detection and object recognition systems in event-based computer vision.
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers have released TaCarla, a comprehensive dataset containing over 2.85 million frames from CARLA simulation environment designed for end-to-end autonomous driving research. The dataset addresses limitations in existing autonomous driving datasets by providing both perception and planning data with diverse behavioral scenarios for comprehensive model training and evaluation.
$RNDR
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers developed a dual-branch neural network for micro-expression recognition that combines residual and Inception networks with parallel attention mechanisms. The method achieved 74.67% accuracy on the CASME II dataset, significantly outperforming existing approaches like LBP-TOP by over 11%.
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers introduce DirMixE, a new machine learning approach for handling test-agnostic long-tail recognition problems where test data distributions are unknown and imbalanced. The method uses a hierarchical Mixture-of-Expert strategy with Dirichlet meta-distributions and includes a Latent Skill Finetuning framework for efficient parameter tuning of foundation models.
AIBullisharXiv โ CS AI ยท 7h ago1
๐ง Researchers have developed R2GenCSR, a new AI framework for generating radiology reports that uses Mamba architecture instead of Transformers to reduce computational complexity while maintaining performance. The system leverages context retrieval and large language models to produce high-quality medical reports from X-ray images.
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers propose a new concept-based adversarial attack framework that targets entire concept distributions rather than single images, generating diverse adversarial examples while preserving the original concept identity. The method creates adversarial images with variations in pose, viewpoint, or background that can still mislead classifiers while remaining recognizable as instances of the original category.
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers analyzed DINOv2 vision transformer using Sparse Autoencoders to understand how it processes visual information, discovering that the model uses specialized concept dictionaries for different tasks like classification and segmentation. They propose the Minkowski Representation Hypothesis as a new framework for understanding how vision transformers combine conceptual archetypes to form representations.
AINeutralarXiv โ CS AI ยท 7h ago1
๐ง Researchers introduce USplat4D, a new uncertainty-aware dynamic Gaussian Splatting framework that improves 3D scene reconstruction from monocular video by modeling per-Gaussian uncertainty. The approach addresses motion drift and poor synthesis quality by treating well-observed Gaussians as reliable anchors while handling poorly observed ones as less reliable.