#clip-models News & Analysis

8 articles tagged with #clip-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles

AIBullisharXiv – CS AI · May 277/10

🧠

Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights

Researchers introduce the Mimic Score, a geometry-based metric for evaluating data quality in large datasets by measuring gradient alignment with pre-trained models. The proposed Grad-Mimic framework enables efficient data selection, reducing training steps for CLIP models by 20.7% and filtering datasets without expensive computations or validation sets.

AIBearisharXiv – CS AI · May 117/10

🧠

From Clouds to Hallucinations: Atmospheric Retrieval Hijacking in Remote Sensing Vision-Language RAG

Researchers introduce CloudWeb, an adversarial attack that manipulates remote sensing images with realistic cloud and haze patterns to hijack vision-language retrieval systems in multimodal RAG pipelines. The attack achieves significant success rates—increasing weather-related evidence injection from 0.71% to 43.29% on benchmark tests—demonstrating that input-space threats to retrieval stages remain largely undefended in production systems.

🏢 OpenAI

AINeutralarXiv – CS AI · Jun 236/10

🧠

Beyond Templates: Revisiting Zero-Shot Remote Sensing through Meta-Prompting

Researchers analyze how vision-language models perform zero-shot remote sensing tasks across multiple datasets and find that textual design choices critically impact performance. The study reveals that semantically rich LLM-generated descriptions don't consistently outperform simpler template-based descriptions due to noise in text embeddings, but lightweight query embedding calibration effectively improves results.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Calibrating Uncertainty for Zero-Shot Adversarial CLIP

Researchers propose an adversarial fine-tuning method for CLIP that addresses a critical gap in zero-shot classification: while perturbations degrade accuracy, they also suppress uncertainty estimates, causing overconfidence. The approach reparameterizes CLIP outputs as Dirichlet distribution parameters to jointly optimize for robustness and calibrated uncertainty, achieving competitive results across benchmarks.

AIBullisharXiv – CS AI · Jun 16/10

🧠

On Revisiting Entropy for Identifying Mislabeled Images

Researchers propose a novel method called Signed Entropy Integral (SEI) to detect mislabeled images in training datasets by analyzing how prediction entropy changes during model training. The technique shows that correctly labeled samples exhibit consistent entropy decrease while mislabeled ones maintain high entropy, achieving state-of-the-art performance on medical imaging datasets.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration

Researchers present DA-FSS, a new deep learning model that improves 3D point cloud segmentation by decoupling semantic and geometric processing paths rather than fusing them together. The approach addresses fundamental limitations in existing multimodal few-shot learning methods, demonstrating superior performance on standard benchmark datasets.

AINeutralarXiv – CS AI · May 96/10

🧠

Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery

Researchers introduce Open-SAT, a training-free algorithm that uses Large Language Models to refine query embeddings for satellite image retrieval tasks. The method improves upon existing vision-language models by leveraging LLM-guided contextual refinement at inference time, achieving up to 16% F1 score improvement on open-vocabulary satellite imagery tasks without requiring additional training.

AINeutralarXiv – CS AI · Apr 136/10

🧠

CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion

Researchers introduce CLIP-Inspector, a backdoor detection method for prompt-tuned CLIP models that reconstructs hidden triggers using out-of-distribution images to identify if a model has been maliciously compromised. The technique achieves 94% detection accuracy and enables post-hoc model repair, addressing critical security vulnerabilities in outsourced machine learning services.