AIBullisharXiv – CS AI · 15h ago7/10
🧠
Evaluating Sample Utility for Efficient Data Selection by Mimicking Model Weights
Researchers introduce the Mimic Score, a geometry-based metric for evaluating data quality in large datasets by measuring gradient alignment with pre-trained models. The proposed Grad-Mimic framework enables efficient data selection, reducing training steps for CLIP models by 20.7% and filtering datasets without expensive computations or validation sets.