GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection
Researchers introduce GP-Adapter, a training-free framework combining CLIP with Gaussian Process uncertainty modeling to improve few-shot classification and out-of-distribution detection. The approach maintains CLIP's frozen backbone while adding probabilistic inference capabilities, requiring minimal computational overhead and achieving competitive performance on multiple benchmarks.
GP-Adapter addresses a fundamental limitation in large pre-trained vision-language models: their inability to quantify uncertainty in predictions. While CLIP revolutionized zero-shot learning through contrastive language-image training, it produces only deterministic similarity scores without confidence estimates. This creates vulnerabilities in low-data regimes and when encountering distribution shifts—exactly where uncertainty quantification becomes critical for reliable deployment.
The framework's innovation lies in its modality-specific approach, constructing separate one-class Gaussian Processes for image and text embeddings with appropriate kernels, then fusing their predictive statistics. This design elegantly maintains CLIP's frozen weights, eliminating fine-tuning requirements while keeping memory costs manageable at O(CK²) complexity. The method demonstrates practical efficiency by leveraging only small K-shot caches and lightweight hyperparameter selection.
Experimentally, GP-Adapter shows consistent improvements in out-of-distribution detection across ImageNet and multiple OOD benchmarks, while maintaining competitive few-shot classification accuracy. The complementary relationship between probabilistic inference and prompt-learning baselines suggests that uncertainty modeling and prompt engineering address different aspects of the same problem—one capturing distributional confidence, the other optimizing decision boundaries.
For the broader AI community, this work validates that integrating classical probabilistic methods with modern deep learning models can enhance reliability without sacrificing efficiency. The training-free nature makes adoption straightforward for practitioners already using CLIP, positioning it as a practical augmentation layer rather than a replacement architecture.
- →GP-Adapter adds uncertainty quantification to frozen CLIP embeddings without fine-tuning, improving out-of-distribution detection reliability.
- →The framework uses modality-specific one-class Gaussian Processes with RBF kernels for images and linear kernels for text, combining their predictions for variance-aware confidence scores.
- →Memory complexity scales as O(CK²) for C classes and K shots, maintaining efficiency with only small cached data and lightweight hyperparameter selection.
- →Experiments demonstrate consistent improvements in OOD detection when combined with prompt-learning baselines, suggesting complementary benefits between approaches.
- →The training-free design enables practitioners to augment existing CLIP deployments with probabilistic inference without architectural changes or computational overhead.