y0news
← Feed
Back to feed
🧠 AI NeutralImportance 5/10

Visual Words Meet BM25: Sparse Auto-Encoder Visual Word Scoring for Image Retrieval

arXiv – CS AI|Donghoon Han, Eunhwan Park, Seunghyeon Seo|
🤖AI Summary

Researchers introduce BM25-V, a new image retrieval method that combines sparse visual-word activations from Vision Transformers with BM25 scoring for efficient and interpretable image search. The approach achieves 99.3%+ recall across seven benchmarks while offering explainable results and serving as an efficient first-stage retriever for dense reranking systems.

Key Takeaways
  • BM25-V applies traditional text search scoring (BM25) to visual features extracted from Vision Transformers using Sparse Auto-Encoders.
  • The method achieves over 99.3% Recall@200 across seven benchmarks, enabling efficient two-stage retrieval pipelines.
  • Visual word distributions follow Zipfian patterns, making BM25's inverse document frequency weighting naturally suited for image retrieval.
  • A single model trained on ImageNet-1K transfers zero-shot to fine-grained benchmarks without additional fine-tuning.
  • Unlike dense retrieval methods, BM25-V provides interpretable results by attributing decisions to specific visual words with quantified contributions.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles