An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.
- This is an unofficial implementation and is not affiliated with Google Research or the original authors.
- Some parts of this README were generated by AI.
MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.
Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.
MuVERA solves this by:
- Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
- Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
- Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
- Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity
The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.
- ✅ Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
- ✅ Batch Processing: Efficient parallel encoding for large datasets
- ✅ Type Safety: Generic implementation supporting
f32andf64 - ✅ Memory Efficient: Uses
ndarrayfor optimized vector operations - ✅ Deterministic: Reproducible results with seed-based randomization
- ✅ Comprehensive Testing: Extensive test suite covering edge cases
Add to your Cargo.toml:
[dependencies]
muvera-rs = "0.1.0"Install from PyPI:
pip install muveraOr install from source:
pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindingsAfter building with maturin, you can test the Python bindings interactively:
import numpy as np
import muvera
embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)Or save the above as a script and run with python my_test.py.
import muvera
import numpy as np
# Initialize encoder once (Storage/Index time)
# 128 buckets, 768 dim
encoder = muvera.PyFDEEncoder(128, 768, seed=42)
# Encode multiple documents
doc_embeddings = np.random.randn(32, 768).astype(np.float32)
doc_fde = encoder.encode(doc_embeddings, "mean") # (128,) vector
# Encode a query
query_embeddings = np.random.randn(10, 768).astype(np.float32)
query_fde = encoder.encode(query_embeddings, "max") # (128,) vectoruse muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};
fn main() {
// Create encoder with 1024 buckets for 768-dimensional embeddings
// Resulting vector will be ONLY 1024-dimensional, NOT 1024 * 768.
let encoder = FDEEncoder::new(1024, 768, 42);
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Encode query (Max-pooling aggregation as per paper)
let query_fde = encoder.encode_query(tokens.view()); // Output: Array1<f32> of length 1024
// Encode document (Mean-pooling aggregation)
let doc_fde = encoder.encode_doc(tokens.view()); // Output: Array1<f32> of length 1024
}use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};
fn encode_batch() {
let encoder = FDEEncoder::new(128, 768, 42);
// Batch of token embeddings (batch_size, num_tokens, embedding_dim)
let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
// Encode entire batch
let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
println!("Encoded batch shape: {:?}", batch_fdes.shape());
}use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
fn custom_encoding() {
let encoder = FDEEncoder::new(128, 768, 42);
let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
// Use custom aggregation mode
let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}The main encoder struct that implements the Fixed Dimensional Encoding algorithm.
pub fn new(buckets: usize, dim: usize, seed: u64) -> Selfbuckets: Number of hash buckets (hyperplanes)dim: Dimensionality of input token embeddingsseed: Random seed for reproducible hyperplane initialization
encode(tokens, mode): Encode single multi-vectorbatch_encode(tokens, mode): Encode batch of multi-vectors (parallel)encode_query(tokens): Encode query with sum aggregationencode_doc(tokens): Encode document with average aggregationencode_query_batch(tokens): Batch encode queriesencode_doc_batch(tokens): Batch encode documents
Enum defining aggregation modes:
Sum: Sum all tokens in each bucketAvg: Average all tokens in each bucket
- Buckets: More buckets = better approximation but higher memory usage
- Batch Size: Use batch encoding for multiple vectors to leverage parallelism
- Memory: FDE vectors are
buckets * dimin size - Precision:
f32is typically sufficient and faster thanf64
- MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
arXiv:2405.19504
- MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search
Google Research Blog
- Google Graph Mining Repository
github.com/google/graph-mining/tree/main/sketching/point_cloud
-
Benchmark Suite
- Integration with BEIR datasets (MS MARCO, etc.)
- Memory usage and compression analysis
-
Advanced Features
- Support BLAS
- Product Quantization (PQ): 32x compression with minimal quality loss
- Final Projections: Dimensionality reduction techniques
This project is licensed under the MIT License - see the LICENSE file for details.