24 articles tagged with #zero-shot-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 97/10
๐ง Researchers introduce RAG-Driver, a retrieval-augmented multi-modal large language model designed for autonomous driving that can provide explainable decisions and control predictions. The system addresses data scarcity and generalization challenges in AI-driven autonomous vehicles by using in-context learning and expert demonstration retrieval.
AIBullisharXiv โ CS AI ยท Mar 47/104
๐ง Researchers introduce Retrieval-Augmented Robotics (RAR), a new paradigm enabling robots to actively retrieve and use external visual documentation to execute complex tasks. The system uses a Retrieve-Reason-Act loop where robots search unstructured visual manuals, align 2D diagrams with 3D objects, and synthesize executable plans for assembly tasks.
AINeutralarXiv โ CS AI ยท Mar 47/104
๐ง Researchers introduce GraphSSR, a new framework that improves zero-shot graph learning by combining Large Language Models with adaptive subgraph denoising. The system addresses structural noise issues in existing methods through a dynamic 'Sample-Select-Reason' pipeline and reinforcement learning training.
AIBullisharXiv โ CS AI ยท Mar 37/103
๐ง Researchers introduce VITA, a zero-shot value function learning method that enhances Vision-Language Models through test-time adaptation for robotic manipulation tasks. The system updates parameters sequentially over trajectories to improve temporal reasoning and generalizes across diverse environments, outperforming existing autoregressive VLM methods.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers developed UrbanFM, a foundation model for urban spatio-temporal data that can analyze traffic patterns and city dynamics across over 100 global cities. The model demonstrates zero-shot generalization capabilities, meaning it can make predictions for unseen cities without additional training, potentially revolutionizing urban planning and smart city applications.
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers introduced Graph-of-Mark (GoM), a new visual prompting technique that overlays scene graphs onto images to improve spatial reasoning in multimodal language models. Testing across 3 open-source MLMs and 4 datasets showed GoM improved zero-shot visual question answering and localization accuracy by up to 11 percentage points compared to existing methods like Set-of-Mark.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers developed PREBA, a retrieval-augmented framework that uses PCA-weighted retrieval and Bayesian averaging to improve surgical duration prediction accuracy by up to 40% using large language models. The system grounds LLM predictions in institution-specific clinical data without requiring computationally intensive training, achieving performance competitive with supervised machine learning methods.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers propose MA-VLCM, a framework that uses pretrained vision-language models as centralized critics in multi-agent reinforcement learning instead of learning critics from scratch. This approach significantly improves sample efficiency and enables zero-shot generalization while producing compact policies suitable for resource-constrained robots.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce AutoEP, a framework that uses Large Language Models (LLMs) as zero-shot reasoning engines to automatically configure algorithm hyperparameters without requiring training. The system combines real-time landscape analysis with multi-LLM reasoning to outperform existing methods and enables open-source models like Qwen3-30B to match GPT-4's performance in optimization tasks.
๐ง GPT-4
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers developed VLAD-Grasp, a training-free robotic grasping system that uses vision-language models to detect graspable objects without requiring curated datasets. The system achieves competitive performance with state-of-the-art methods on benchmark datasets and demonstrates zero-shot generalization to real-world robotic manipulation tasks.
AIBullisharXiv โ CS AI ยท Mar 96/10
๐ง Researchers developed SecureRAG-RTL, a new AI framework that uses Retrieval-Augmented Generation to detect security vulnerabilities in hardware designs. The system improves detection accuracy by 30% on average across different LLM architectures and addresses the challenge of limited hardware security datasets for AI training.
AIBullisharXiv โ CS AI ยท Mar 37/107
๐ง Meta researchers introduced MetaMind, a cognitive world model for multi-agent systems that enables agents to understand and predict other agents' behaviors without centralized supervision or communication. The system uses a meta-theory of mind framework allowing agents to reason about goals and beliefs of others through self-reflective learning and analogical reasoning.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers have developed FCN-LLM, a framework that enables Large Language Models to understand brain functional connectivity networks from fMRI scans through multi-task instruction tuning. The system uses a multi-scale encoder to capture brain features and demonstrates strong zero-shot generalization across unseen datasets, outperforming conventional supervised models.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers propose M3-AD, a new reflection-aware multimodal framework that improves industrial anomaly detection using large language models. The system includes RA-Monitor technology that enables AI models to self-correct unreliable decisions, outperforming existing open-source and commercial models in zero-shot anomaly detection tasks.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers introduce AG-VAS, a new AI framework that uses large multimodal models for zero-shot visual anomaly segmentation. The system employs learnable semantic anchor tokens and achieves state-of-the-art performance on industrial and medical benchmarks without requiring training data for specific anomaly types.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers developed a knowledge graph-guided chain-of-thought framework that uses large language models for disease prediction from electronic health records. The approach outperformed classical baselines and showed strong zero-shot transfer capabilities, with clinicians preferring the AI-generated explanations for their clarity and relevance.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduce LLaVE, a new multimodal embedding model that uses hardness-weighted contrastive learning to better distinguish between positive and negative pairs in image-text tasks. The model achieves state-of-the-art performance on the MMEB benchmark, with LLaVE-2B outperforming previous 7B models and demonstrating strong zero-shot transfer capabilities to video retrieval tasks.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers propose a data-efficient framework to convert generative Multimodal Large Language Models into universal embedding models without extensive pre-training. The method uses hierarchical embedding prompts and Self-aware Hard Negative Sampling to achieve competitive performance on embedding benchmarks using minimal training data.
AINeutralarXiv โ CS AI ยท Mar 54/10
๐ง Researchers have created CzechTopic, a new benchmark dataset for evaluating AI models' ability to identify specific topics within historical Czech documents. The study compared various large language models and BERT-based models, finding significant performance variations with the strongest models approaching human-level accuracy in topic detection.
AINeutralarXiv โ CS AI ยท Mar 34/104
๐ง Researchers have created CrimeNER, a specialized dataset of over 1,500 annotated crime-related documents for training named-entity recognition AI models. The study addresses the lack of quality training data in the crime domain by developing a database from terrorist attack reports and DOJ press notes, defining 22 types of crime-related entities.
AIBullishApple Machine Learning ยท Mar 35/102
๐ง EMBridge is a new AI framework that enhances gesture recognition from EMG biosignals by aligning them with high-quality structured data from videos and images. The technology enables zero-shot gesture generalization on low-power wearable devices, potentially advancing human-computer interaction applications.
AINeutralarXiv โ CS AI ยท Feb 274/107
๐ง Researchers benchmarked small language models (SLMs) for leader-follower role classification in human-robot interaction, finding that fine-tuned Qwen2.5-0.5B achieves 86.66% accuracy with 22.2ms latency. The study demonstrates SLMs can effectively handle real-time role assignment for resource-constrained robots, though performance degrades with increased dialogue complexity.
AINeutralHugging Face Blog ยท Dec 214/105
๐ง The article appears to discuss CLIPSeg, a zero-shot image segmentation technology that can segment images without prior training on specific datasets. However, the article body is empty, making detailed analysis impossible.
AINeutralarXiv โ CS AI ยท Mar 34/106
๐ง Researchers developed a multi-condition digital twin calibration framework for axial piston pumps that can simulate compound faults and enable zero-shot fault diagnosis. The physics-data coupled approach addresses data scarcity issues in traditional fault detection methods and demonstrates accurate reproduction of both single and compound faults in hydraulic systems.
$XRP$COMP