y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#zero-shot-learning News & Analysis

52 articles tagged with #zero-shot-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

52 articles
AINeutralarXiv – CS AI · May 126/10
🧠

Zero-shot Imitation Learning by Latent Topology Mapping

Researchers introduce ZALT, an imitation learning method that enables AI agents to solve unseen tasks by identifying latent hub states in demonstrated trajectories and planning over abstract topology. The approach achieves 55% zero-shot success on complex maze tasks compared to 6% for existing baselines, addressing the challenge of adapting learned behaviors to new long-horizon goals without additional training.

AIBullisharXiv – CS AI · May 126/10
🧠

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Researchers introduce GibbsTTS, a new zero-shot text-to-speech system using metric-induced discrete flow matching with kinetic-optimal scheduling and moment correction. The method achieves superior naturalness and speaker similarity compared to existing masked generative models and state-of-the-art TTS systems without requiring hyperparameter tuning.

AIBullisharXiv – CS AI · May 126/10
🧠

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

Researchers propose Semantic Softmax, a novel inference-time method that improves zero-shot LLM classification by recovering probability mass lost during constrained decoding. The approach aggregates scores from semantic synonyms, reducing calibration errors and boosting accuracy on emotion and toxicity detection tasks.

AINeutralarXiv – CS AI · May 116/10
🧠

UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios

UNCOM is a zero-shot framework that enables robots to understand natural human commands in tabletop environments by integrating speech, gestures, and scene context without requiring task-specific training data. The system achieves 82.39% success rate on real-world interaction scenarios, demonstrating practical viability for general-purpose domestic robotics applications.

AINeutralarXiv – CS AI · May 96/10
🧠

ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

ActCam is a zero-shot AI method that enables simultaneous control of character motion and camera movement in video generation without requiring model retraining. The technique uses a two-phase conditioning approach with pose and depth constraints to generate videos with improved geometric consistency and motion fidelity across diverse scenarios.

AIBullisharXiv – CS AI · Apr 206/10
🧠

DiZiNER: Disagreement-guided Instruction Refinement via Pilot Annotation Simulation for Zero-shot Named Entity Recognition

Researchers introduce DiZiNER, a framework that improves zero-shot named entity recognition by simulating human annotation disagreement processes using multiple LLMs. The approach achieves state-of-the-art results on 14 of 18 benchmarks, closing the performance gap between zero-shot and supervised systems by over 11 percentage points.

🧠 GPT-5
AINeutralarXiv – CS AI · Apr 136/10
🧠

ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer

Researchers introduce ASPECT, a novel reinforcement learning framework that uses large language models as semantic operators to enable zero-shot transfer learning across novel tasks. By conditioning a text-based VAE on LLM-generated task descriptions, the approach allows agents to reuse policies on structurally similar but previously unseen tasks without discrete category constraints.

AIBullisharXiv – CS AI · Mar 276/10
🧠

Graph-of-Mark: Promote Spatial Reasoning in Multimodal Language Models with Graph-Based Visual Prompting

Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.

AIBullisharXiv – CS AI · Mar 176/10
🧠

PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.

AIBullisharXiv – CS AI · Mar 176/10
🧠

AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms

Researchers introduce AutoEP, a framework that uses Large Language Models (LLMs) as zero-shot reasoning engines to automatically configure algorithm hyperparameters without requiring training. The system combines real-time landscape analysis with multi-LLM reasoning to outperform existing methods and enables open-source models like Qwen3-30B to match GPT-4's performance in optimization tasks.

🧠 GPT-4
AIBullisharXiv – CS AI · Mar 176/10
🧠

VLAD-Grasp: Zero-shot Grasp Detection via Vision-Language Models

Researchers developed VLAD-Grasp, a training-free robotic grasping system that uses vision-language models to detect graspable objects without requiring curated datasets. The system achieves competitive performance with state-of-the-art methods on benchmark datasets and demonstrates zero-shot generalization to real-world robotic manipulation tasks.

AIBullisharXiv – CS AI · Mar 37/107
🧠

MetaMind: General and Cognitive World Models in Multi-Agent Systems by Meta-Theory of Mind

Meta researchers introduced MetaMind, a cognitive world model for multi-agent systems that enables agents to understand and predict other agents' behaviors without centralized supervision or communication. The system uses a meta-theory of mind framework allowing agents to reason about goals and beliefs of others through self-reflective learning and analogical reasoning.

AIBullisharXiv – CS AI · Mar 36/108
🧠

FCN-LLM: Empower LLM for Brain Functional Connectivity Network Understanding via Graph-level Multi-task Instruction Tuning

Researchers have developed FCN-LLM, a framework that enables Large Language Models to understand brain functional connectivity networks from fMRI scans through multi-task instruction tuning. The system uses a multi-scale encoder to capture brain features and demonstrates strong zero-shot generalization across unseen datasets, outperforming conventional supervised models.

AIBullisharXiv – CS AI · Mar 36/107
🧠

M3-AD: Reflection-aware Multi-modal, Multi-category, and Multi-dimensional Benchmark and Framework for Industrial Anomaly Detection

Researchers propose M3-AD, a new reflection-aware multimodal framework that improves industrial anomaly detection using large language models. The system includes RA-Monitor technology that enables AI models to self-correct unreliable decisions, outperforming existing open-source and commercial models in zero-shot anomaly detection tasks.

AIBullisharXiv – CS AI · Mar 36/107
🧠

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models

Researchers introduce AG-VAS, a new AI framework that uses large multimodal models for zero-shot visual anomaly segmentation. The system employs learnable semantic anchor tokens and achieves state-of-the-art performance on industrial and medical benchmarks without requiring training data for specific anomaly types.

AIBullisharXiv – CS AI · Mar 36/103
🧠

Knowledge Graph Augmented Large Language Models for Disease Prediction

Researchers developed a knowledge graph-guided chain-of-thought framework that uses large language models for disease prediction from electronic health records. The approach outperformed classical baselines and showed strong zero-shot transfer capabilities, with clinicians preferring the AI-generated explanations for their clarity and relevance.

AIBullisharXiv – CS AI · Mar 36/104
🧠

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Researchers introduce LLaVE, a new multimodal embedding model that uses hardness-weighted contrastive learning to better distinguish between positive and negative pairs in image-text tasks. The model achieves state-of-the-art performance on the MMEB benchmark, with LLaVE-2B outperforming previous 7B models and demonstrating strong zero-shot transfer capabilities to video retrieval tasks.

AIBullisharXiv – CS AI · Mar 26/1014
🧠

From Generator to Embedder: Harnessing Innate Abilities of Multimodal LLMs via Building Zero-Shot Discriminative Embedding Model

Researchers propose a data-efficient framework to convert generative Multimodal Large Language Models into universal embedding models without extensive pre-training. The method uses hierarchical embedding prompts and Self-aware Hard Negative Sampling to achieve competitive performance on embedding benchmarks using minimal training data.

AINeutralarXiv – CS AI · Mar 54/10
🧠

CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents

Researchers have created CzechTopic, a new benchmark dataset for evaluating AI models' ability to identify specific topics within historical Czech documents. The study compared various large language models and BERT-based models, finding significant performance variations with the strongest models approaching human-level accuracy in topic detection.

AINeutralarXiv – CS AI · Feb 274/107
🧠

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

Researchers benchmarked small language models (SLMs) for leader-follower role classification in human-robot interaction, finding that fine-tuned Qwen2.5-0.5B achieves 86.66% accuracy with 22.2ms latency. The study demonstrates SLMs can effectively handle real-time role assignment for resource-constrained robots, though performance degrades with increased dialogue complexity.

← PrevPage 2 of 3Next →