#zero-shot-learning News & Analysis

52 articles tagged with #zero-shot-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

52 articles

AIBullisharXiv – CS AI · 2d ago7/10

🧠

Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy

Researchers introduce TimeRCD, a foundation model for time series anomaly detection that uses a novel Relative Context Discrepancy approach instead of traditional reconstruction methods. The model achieves superior zero-shot performance by detecting discrepancies between adjacent time windows, addressing fundamental limitations in existing anomaly detection systems that produce high false positive and negative rates.

AIBullisharXiv – CS AI · 2d ago7/10

🧠

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings

Researchers introduce COMET, a PLS-SVD framework that analyzes the modality gap in Contrastive Language-Audio Pretraining (CLAP) models by decomposing embeddings into interpretable concepts. The study reveals that only a small subset of shared conceptual axes drives similarity computation, and proposes a training-free spectral truncation method that improves zero-shot audio captioning performance while reducing dimensionality.

AIBullisharXiv – CS AI · 3d ago7/10

🧠

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

Researchers introduce BioELX, a two-stage cross-lingual biomedical entity linking system that maps medical mentions across languages to knowledge base identifiers without requiring task-specific training data. The framework combines multilingual alias-enriched retrieval with LLM-based ranking, achieving state-of-the-art results across five benchmarks with substantial improvements for low-resource languages.

AIBullisharXiv – CS AI · 4d ago7/10

🧠

VesselSim: learning 3D blood vessel segmentation without expert annotations

Researchers introduce VesselSim, a framework that trains 3D blood vessel segmentation models entirely on synthetic, unannotated data rather than requiring expert-labeled medical images. The system combines geometric vascular simulation with domain adaptation techniques to achieve competitive performance with state-of-the-art models on real clinical scans across multiple imaging modalities and anatomical regions.

AIBullisharXiv – CS AI · May 117/10

🧠

Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models

Researchers demonstrate that vision-language models (VLMs) can effectively function as zero-shot sensors for perceiving Operational Design Domains (ODDs) in autonomous systems without task-specific training. The study evaluates four VLMs on ODD classification and detection tasks, finding that chain-of-thought prompting with persona decomposition achieves optimal performance, providing a scalable approach for safety-critical autonomous driving applications.

AIBullisharXiv – CS AI · May 97/10

🧠

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

X-Voice is a 0.4B multilingual voice cloning model that enables zero-shot cross-lingual speech synthesis across 30 languages using a two-stage training approach with IPA as a unified representation. The open-sourced system achieves performance comparable to billion-scale models while eliminating the need for transcribed audio prompts, advancing accessibility in multilingual AI-generated speech.

AIBullisharXiv – CS AI · May 77/10

🧠

A Foundation Model for Zero-Shot Logical Rule Induction

Researchers introduce Neural Rule Inducer (NRI), a pretrained foundation model enabling zero-shot logical rule induction without task-specific retraining. By encoding domain-agnostic statistical properties instead of literal identities, NRI generalizes across different predicates and demonstrates robustness to label noise and spurious correlations, advancing toward foundation models for symbolic reasoning.

AIBullisharXiv – CS AI · May 47/10

🧠

Training-Free Time Series Classification via In-Context Reasoning with LLM Agents

Researchers introduce FETA, a multi-agent framework that enables large language models to classify time series data without any training or fine-tuning. The system decomposes multivariate time series into individual channels, retrieves similar labeled examples, and uses LLM reasoning to make predictions with confidence scores, achieving competitive accuracy on benchmark datasets.

AIBullisharXiv – CS AI · Apr 207/10

🧠

EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs

Researchers introduce EVIL, an LLM-guided evolutionary approach that discovers interpretable Python algorithms for zero-shot inference on time series and event sequences without traditional neural network training. The evolved algorithms match or exceed deep learning performance while remaining transparent and significantly faster, demonstrating a novel paradigm for dynamical systems inference.

AIBullisharXiv – CS AI · Mar 97/10

🧠

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

Researchers introduce RAG-Driver, a retrieval-augmented multi-modal large language model designed for autonomous driving that can provide explainable decisions and control predictions. The system addresses data scarcity and generalization challenges in AI-driven autonomous vehicles by using in-context learning and expert demonstration retrieval.

AINeutralarXiv – CS AI · Mar 47/104

🧠

Beyond One-Size-Fits-All: Adaptive Subgraph Denoising for Zero-Shot Graph Learning with Large Language Models

Researchers introduce GraphSSR, a new framework that improves zero-shot graph learning by combining Large Language Models with adaptive subgraph denoising. The system addresses structural noise issues in existing methods through a dynamic 'Sample-Select-Reason' pipeline and reinforcement learning training.

AIBullisharXiv – CS AI · Mar 47/104

🧠

Retrieval-Augmented Robots via Retrieve-Reason-Act

Researchers introduce Retrieval-Augmented Robotics (RAR), a new paradigm enabling robots to actively retrieve and use external visual documentation to execute complex tasks. The system uses a Retrieve-Reason-Act loop where robots search unstructured visual manuals, align 2D diagrams with 3D objects, and synthesize executable plans for assembly tasks.

AIBullisharXiv – CS AI · Mar 37/104

🧠

UrbanFM: Scaling Urban Spatio-Temporal Foundation Models

Researchers developed UrbanFM, a foundation model for urban spatio-temporal data that can analyze traffic patterns and city dynamics across over 100 global cities. The model demonstrates zero-shot generalization capabilities, meaning it can make predictions for unseen cities without additional training, potentially revolutionizing urban planning and smart city applications.

AIBullisharXiv – CS AI · Mar 37/103

🧠

VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models

Researchers introduce VITA, a zero-shot value function learning method that enhances Vision-Language Models through test-time adaptation for robotic manipulation tasks. The system updates parameters sequentially over trajectories to improve temporal reasoning and generalizes across diverse environments, outperforming existing autoregressive VLM methods.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

Model Fusion via Retrofitting

Researchers introduce a neuron-centric model fusion algorithm that combines independently trained neural networks without retraining by matching intermediate representations and using neuron attribution scores. The method outperforms existing approaches in zero-shot and non-IID scenarios across multiple architectures including VGGs, ResNets, and Vision Transformers.

AIBullisharXiv – CS AI · 2d ago6/10

🧠

ReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal Control

ReasonLight introduces a multimodal AI framework that enhances reinforcement learning for traffic signal control by integrating camera feeds, sensor data, and foundation models to handle rare events unseen during training. The system demonstrates zero-shot adaptation capabilities, reducing emergency vehicle response times by up to 88.7% without requiring model retraining.

AIBullisharXiv – CS AI · 2d ago6/10

🧠

KairosAgent: Agentic Time Series Forecasting with Fused Semantic Reasoning

Researchers introduce KairosAgent, an agentic framework combining large language models with time series foundation models to improve multimodal forecasting across domains. The system uses semantic reasoning from LLMs fused with numerical forecasting capabilities, achieving superior zero-shot performance through reinforcement learning and structured tool integration.

AINeutralarXiv – CS AI · 2d ago6/10

🧠

Toward Ethical Facial Age Estimation: A Generalized Zero-Shot Benchmark Without Training on Children's Data

Researchers propose an ethical benchmark for facial age estimation that excludes children's data during training, addressing privacy and legal concerns in AI development. Testing nine state-of-the-art methods reveals severe performance degradation (46.4% average) when models encounter unseen age groups, exposing a critical gap between current practices and responsible data governance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?

A reproducibility study of the TRIANGLE framework reveals that geometric alignment on hyperspheres improves multimodal retrieval beyond traditional pairwise approaches, achieving up to 8.7 point gains in zero-shot settings. However, researchers identified critical optimization instabilities when jointly training with data-text matching loss and reduced cross-dataset generalization with fine-tuning, suggesting the method's benefits are context-dependent rather than universally applicable.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

LELA: An End-to-end LLM-based Entity Linking Framework with Zero-shot Domain Adaptation

Researchers have extended LELA, an LLM-based entity linking framework, into a practical Python library that combines zero-shot Named Entity Recognition with entity disambiguation. The end-to-end pipeline addresses limitations in existing approaches by offering domain-agnostic capabilities and demonstrating robust performance across diverse entity linking tasks, making it more applicable to real-world usage scenarios.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Boosting Knowledge Graph Foundation Models via Enhanced Negative Sampling

Researchers propose KMAS, an adaptive negative sampling method that enhances knowledge graph foundation models by constructing higher-quality hard negative triples and dynamically adjusting their ratio throughout training. The approach improves multiple state-of-the-art KGFMs across 44 datasets without significant computational overhead, advancing zero-shot knowledge graph completion for unseen relational vocabularies.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Respecting Modality Gap in Post-hoc Out-of-distribution Detection with Pre-trained Vision-Language Models

Researchers challenge the standard approach of using text embeddings as class prototypes in out-of-distribution detection with vision-language models, demonstrating a fundamental misalignment between text and visual feature spaces. They propose an online pseudo-supervised framework that learns visual prototypes directly from unlabeled test data, achieving state-of-the-art OOD detection performance.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

FoundObj: Self-supervised Foundation Models as Rewards for Label-free 3D Object Segmentation

FoundObj introduces a self-supervised framework for 3D object segmentation in point clouds without manual scene-level annotations, using reinforcement learning guided by semantic and geometric reward modules from foundation models. The approach demonstrates strong performance across benchmarks and shows particular promise in zero-shot and long-tail scenarios, advancing label-free computer vision capabilities.

AINeutralarXiv – CS AI · May 126/10

🧠

Zero-shot Imitation Learning by Latent Topology Mapping

Researchers introduce ZALT, an imitation learning method that enables AI agents to solve unseen tasks by identifying latent hub states in demonstrated trajectories and planning over abstract topology. The approach achieves 55% zero-shot success on complex maze tasks compared to 6% for existing baselines, addressing the challenge of adapting learned behaviors to new long-horizon goals without additional training.

AIBullisharXiv – CS AI · May 126/10

🧠

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Researchers introduce GibbsTTS, a new zero-shot text-to-speech system using metric-induced discrete flow matching with kinetic-optimal scheduling and moment correction. The method achieves superior naturalness and speaker similarity compared to existing masked generative models and state-of-the-art TTS systems without requiring hyperparameter tuning.

Page 1 of 3Next →