Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era
Researchers present a unified framework (PQO) that unifies diverse approximate nearest neighbor search methods under three design choices: projection placement, quantization thresholds, and code organization. The framework demonstrates that one-bit codes achieve 32x compression over floats while maintaining quality through re-ranking, with supervised eight-byte codes doubling the performance of two-kilobyte embeddings.
This research addresses a fragmentation problem in retrieval systems that power modern RAG pipelines for large language models. By consolidating disparate hashing and indexing methods—from locality-sensitive hashing to graph-based indexes—under a single projection-quantization-organization (PQO) lens, the work provides practitioners with a principled framework for understanding trade-offs between different approaches.
The research emerges from practical challenges in scaling retrieval systems as LLMs become increasingly central to AI applications. Vector databases and semantic search have exploded across the industry, yet the underlying compression and indexing methods remain scattered across academic communities with limited cross-pollination. This fragmentation increases implementation complexity and prevents systematic optimization.
For developers and infrastructure teams, the BitBudget benchmark offers immediate value by enabling empirical comparison of compression strategies. The finding that one-bit quantization achieves dramatic memory savings while preserving quality through candidate re-ranking suggests significant optimization opportunities for production systems managing massive embedding collections. This is particularly relevant for organizations deploying RAG systems where retrieval latency and memory efficiency directly impact operational costs and model responsiveness.
The framework's applicability across embedding scales and supervision scenarios indicates it will likely become a reference model for designing retrieval systems. The explicit reframing of generative retrieval's semantic identifiers as quantization codes bridges theoretical understanding with practical implementation. Organizations can now systematically evaluate whether investing in learned binary hashing or product quantization yields better cost-performance than simpler baselines for their specific use cases.
- →One-bit quantized codes achieve 32x compression over float32 embeddings with full quality recovery through re-ranking
- →The PQO framework unifies hashing, quantization, and indexing methods under three design choices applicable across the field's history
- →Supervised eight-byte codes double the retrieval quality of standard two-kilobyte float embeddings
- →Trade-off orderings predicted by the framework remain consistent as embedding dimensions grow
- →BitBudget benchmark enables empirical measurement and optimization of compression-retrieval quality trade-offs