AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce ZALT, an imitation learning method that enables AI agents to solve unseen tasks by identifying latent hub states in demonstrated trajectories and planning over abstract topology. The approach achieves 55% zero-shot success on complex maze tasks compared to 6% for existing baselines, addressing the challenge of adapting learned behaviors to new long-horizon goals without additional training.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce GibbsTTS, a new zero-shot text-to-speech system using metric-induced discrete flow matching with kinetic-optimal scheduling and moment correction. The method achieves superior naturalness and speaker similarity compared to existing masked generative models and state-of-the-art TTS systems without requiring hyperparameter tuning.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers propose Semantic Softmax, a novel inference-time method that improves zero-shot LLM classification by recovering probability mass lost during constrained decoding. The approach aggregates scores from semantic synonyms, reducing calibration errors and boosting accuracy on emotion and toxicity detection tasks.
AINeutralarXiv – CS AI · May 116/10
🧠UNCOM is a zero-shot framework that enables robots to understand natural human commands in tabletop environments by integrating speech, gestures, and scene context without requiring task-specific training data. The system achieves 82.39% success rate on real-world interaction scenarios, demonstrating practical viability for general-purpose domestic robotics applications.
AINeutralarXiv – CS AI · May 96/10
🧠ActCam is a zero-shot AI method that enables simultaneous control of character motion and camera movement in video generation without requiring model retraining. The technique uses a two-phase conditioning approach with pose and depth constraints to generate videos with improved geometric consistency and motion fidelity across diverse scenarios.
AIBullisharXiv – CS AI · May 76/10
🧠Researchers introduce JASTIN, an instruction-driven framework that combines frozen audio encoders with fine-tuned LLMs to evaluate generative audio models with zero-shot capabilities. The approach achieves state-of-the-art correlation with human ratings across speech, sound, and music evaluation tasks without task-specific retraining.
AIBullisharXiv – CS AI · Apr 206/10
🧠Researchers introduce DiZiNER, a framework that improves zero-shot named entity recognition by simulating human annotation disagreement processes using multiple LLMs. The approach achieves state-of-the-art results on 14 of 18 benchmarks, closing the performance gap between zero-shot and supervised systems by over 11 percentage points.
🧠 GPT-5
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers introduce ASPECT, a novel reinforcement learning framework that uses large language models as semantic operators to enable zero-shot transfer learning across novel tasks. By conditioning a text-based VAE on LLM-generated task descriptions, the approach allows agents to reuse policies on structurally similar but previously unseen tasks without discrete category constraints.
AIBullisharXiv – CS AI · Mar 276/10
🧠Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose MA-VLCM, a framework that uses pretrained vision-language models as centralized critics in multi-agent reinforcement learning instead of learning critics from scratch. This approach significantly improves sample efficiency and enables zero-shot generalization while producing compact policies suitable for resource-constrained robots.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers introduce AutoEP, a framework that uses Large Language Models (LLMs) as zero-shot reasoning engines to automatically configure algorithm hyperparameters without requiring training. The system combines real-time landscape analysis with multi-LLM reasoning to outperform existing methods and enables open-source models like Qwen3-30B to match GPT-4's performance in optimization tasks.
🧠 GPT-4
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers developed VLAD-Grasp, a training-free robotic grasping system that uses vision-language models to detect graspable objects without requiring curated datasets. The system achieves competitive performance with state-of-the-art methods on benchmark datasets and demonstrates zero-shot generalization to real-world robotic manipulation tasks.
AIBullisharXiv – CS AI · Mar 96/10
🧠Researchers developed SecureRAG-RTL, a new AI framework that uses Retrieval-Augmented Generation to detect security vulnerabilities in hardware designs. The system improves detection accuracy by 30% on average across different LLM architectures and addresses the challenge of limited hardware security datasets for AI training.
AIBullisharXiv – CS AI · Mar 37/107
🧠Meta researchers introduced MetaMind, a cognitive world model for multi-agent systems that enables agents to understand and predict other agents' behaviors without centralized supervision or communication. The system uses a meta-theory of mind framework allowing agents to reason about goals and beliefs of others through self-reflective learning and analogical reasoning.
AIBullisharXiv – CS AI · Mar 36/108
🧠Researchers have developed FCN-LLM, a framework that enables Large Language Models to understand brain functional connectivity networks from fMRI scans through multi-task instruction tuning. The system uses a multi-scale encoder to capture brain features and demonstrates strong zero-shot generalization across unseen datasets, outperforming conventional supervised models.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers propose M3-AD, a new reflection-aware multimodal framework that improves industrial anomaly detection using large language models. The system includes RA-Monitor technology that enables AI models to self-correct unreliable decisions, outperforming existing open-source and commercial models in zero-shot anomaly detection tasks.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers introduce AG-VAS, a new AI framework that uses large multimodal models for zero-shot visual anomaly segmentation. The system employs learnable semantic anchor tokens and achieves state-of-the-art performance on industrial and medical benchmarks without requiring training data for specific anomaly types.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers developed a knowledge graph-guided chain-of-thought framework that uses large language models for disease prediction from electronic health records. The approach outperformed classical baselines and showed strong zero-shot transfer capabilities, with clinicians preferring the AI-generated explanations for their clarity and relevance.
AIBullisharXiv – CS AI · Mar 36/104
🧠Researchers introduce LLaVE, a new multimodal embedding model that uses hardness-weighted contrastive learning to better distinguish between positive and negative pairs in image-text tasks. The model achieves state-of-the-art performance on the MMEB benchmark, with LLaVE-2B outperforming previous 7B models and demonstrating strong zero-shot transfer capabilities to video retrieval tasks.
AIBullisharXiv – CS AI · Mar 26/1014
🧠Researchers propose a data-efficient framework to convert generative Multimodal Large Language Models into universal embedding models without extensive pre-training. The method uses hierarchical embedding prompts and Self-aware Hard Negative Sampling to achieve competitive performance on embedding benchmarks using minimal training data.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers have created CzechTopic, a new benchmark dataset for evaluating AI models' ability to identify specific topics within historical Czech documents. The study compared various large language models and BERT-based models, finding significant performance variations with the strongest models approaching human-level accuracy in topic detection.
AINeutralarXiv – CS AI · Mar 34/104
🧠Researchers have created CrimeNER, a specialized dataset of over 1,500 annotated crime-related documents for training named-entity recognition AI models. The study addresses the lack of quality training data in the crime domain by developing a database from terrorist attack reports and DOJ press notes, defining 22 types of crime-related entities.
AIBullishApple Machine Learning · Mar 35/102
🧠EMBridge is a new AI framework that enhances gesture recognition from EMG biosignals by aligning them with high-quality structured data from videos and images. The technology enables zero-shot gesture generalization on low-power wearable devices, potentially advancing human-computer interaction applications.
AINeutralarXiv – CS AI · Feb 274/107
🧠Researchers benchmarked small language models (SLMs) for leader-follower role classification in human-robot interaction, finding that fine-tuned Qwen2.5-0.5B achieves 86.66% accuracy with 22.2ms latency. The study demonstrates SLMs can effectively handle real-time role assignment for resource-constrained robots, though performance degrades with increased dialogue complexity.