muvera-rs

An unofficial Rust implementation of MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings.

Disclaimer

This is an unofficial implementation and is not affiliated with Google Research or the original authors.
Some parts of this README were generated by AI.

Overview

MuVERA is a breakthrough algorithm that enables efficient multi-vector similarity search by reducing it to single-vector search. This implementation provides the core Fixed Dimensional Encoding (FDE) functionality that transforms variable-length token embeddings into fixed-dimensional vectors while preserving similarity relationships.

The Algorithm

Multi-vector models (like ColBERT) produce multiple embeddings per query/document token, achieving superior retrieval performance compared to single-vector models. However, multi-vector retrieval is computationally expensive due to the complex Chamfer similarity scoring.

MuVERA solves this by:

Fixed Dimensional Encoding (FDE): Transforms sets of token embeddings into fixed-dimensional vectors using randomized hyperplanes and hashing
Asymmetric Encoding: Uses sum aggregation for queries and average aggregation for documents
Single-Vector MIPS: Enables the use of highly-optimized Maximum Inner Product Search algorithms
Theoretical Guarantees: Provides ε-approximation guarantees for Chamfer similarity

The key insight is that the dot product of FDEs approximates the true multi-vector Chamfer similarity, allowing retrieval systems to leverage decades of optimization in single-vector search while maintaining multi-vector quality.

Features

✅ Core FDE Implementation: Complete Fixed Dimensional Encoding with configurable buckets
✅ Batch Processing: Efficient parallel encoding for large datasets
✅ Type Safety: Generic implementation supporting f32 and f64
✅ Memory Efficient: Uses ndarray for optimized vector operations
✅ Deterministic: Reproducible results with seed-based randomization
✅ Comprehensive Testing: Extensive test suite covering edge cases

Installation

Rust

Add to your Cargo.toml:

[dependencies]
muvera-rs = "0.1.0"

Python

Install from PyPI:

pip install muvera

Or install from source:

pip install maturin
git clone https://github.com/NewBornRustacean/muvera-rs.git
cd muvera-rs
maturin develop --features python-bindings

Testing the Python Bindings

After building with maturin, you can test the Python bindings interactively:

import numpy as np
import muvera

embeddings = np.random.randn(32, 768).astype(np.float32)
result = muvera.encode_fde(embeddings, "mean")
print(result)

Or save the above as a script and run with python my_test.py.

Usage

Python Example

import muvera
import numpy as np

# Initialize encoder once (Storage/Index time)
# 128 buckets, 768 dim
encoder = muvera.PyFDEEncoder(128, 768, seed=42)

# Encode multiple documents
doc_embeddings = np.random.randn(32, 768).astype(np.float32)
doc_fde = encoder.encode(doc_embeddings, "mean") # (128,) vector

# Encode a query
query_embeddings = np.random.randn(10, 768).astype(np.float32)
query_fde = encoder.encode(query_embeddings, "max") # (128,) vector

Rust Example

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;
use ndarray::{Array2, ArrayView2};

fn main() {
    // Create encoder with 1024 buckets for 768-dimensional embeddings
    // Resulting vector will be ONLY 1024-dimensional, NOT 1024 * 768.
    let encoder = FDEEncoder::new(1024, 768, 42);
    
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Encode query (Max-pooling aggregation as per paper)
    let query_fde = encoder.encode_query(tokens.view()); // Output: Array1<f32> of length 1024
    
    // Encode document (Mean-pooling aggregation)
    let doc_fde = encoder.encode_doc(tokens.view()); // Output: Array1<f32> of length 1024
}

Batch Processing

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use ndarray::{Array3, ArrayView3};

fn encode_batch() {
    let encoder = FDEEncoder::new(128, 768, 42);
    
    // Batch of token embeddings (batch_size, num_tokens, embedding_dim)
    let batch_tokens = Array3::from_shape_vec((100, 32, 768), vec![0.1; 100 * 32 * 768]).unwrap();
    
    // Encode entire batch
    let batch_fdes = encoder.encode_query_batch(batch_tokens.view());
    
    println!("Encoded batch shape: {:?}", batch_fdes.shape());
}

Custom Aggregation

use muvera_rs::encoder::fde_encoder::{FDEEncoder, FDEEncoding};
use muvera_rs::types::Aggregation;

fn custom_encoding() {
    let encoder = FDEEncoder::new(128, 768, 42);
    let tokens = Array2::from_shape_vec((32, 768), vec![0.1; 32 * 768]).unwrap();
    
    // Use custom aggregation mode
    let fde_sum = encoder.encode(tokens.view(), Aggregation::Sum);
    let fde_avg = encoder.encode(tokens.view(), Aggregation::Avg);
}

API Reference

`FDEEncoder<T>`

The main encoder struct that implements the Fixed Dimensional Encoding algorithm.

Constructor

pub fn new(buckets: usize, dim: usize, seed: u64) -> Self

buckets: Number of hash buckets (hyperplanes)
dim: Dimensionality of input token embeddings
seed: Random seed for reproducible hyperplane initialization

Methods

encode(tokens, mode): Encode single multi-vector
batch_encode(tokens, mode): Encode batch of multi-vectors (parallel)
encode_query(tokens): Encode query with sum aggregation
encode_doc(tokens): Encode document with average aggregation
encode_query_batch(tokens): Batch encode queries
encode_doc_batch(tokens): Batch encode documents

`Aggregation`

Enum defining aggregation modes:

Sum: Sum all tokens in each bucket
Avg: Average all tokens in each bucket

Performance Considerations

Buckets: More buckets = better approximation but higher memory usage
Batch Size: Use batch encoding for multiple vectors to leverage parallelism
Memory: FDE vectors are buckets * dim in size
Precision: f32 is typically sufficient and faster than f64

References

Original Paper

MuVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
Laxman Dhulipala, Jason Lee, Majid Hadian, Rajesh Jayaram, Vahab Mirrokni
arXiv:2405.19504

Google Research Blog

MuVERA: Making Multi-Vector Retrieval as Fast as Single-Vector Search
Google Research Blog

Original Implementation

Google Graph Mining Repository
github.com/google/graph-mining/tree/main/sketching/point_cloud

Future Work

Planned Features

Benchmark Suite
- Integration with BEIR datasets (MS MARCO, etc.)
- Memory usage and compression analysis
Advanced Features
- Support BLAS
- Product Quantization (PQ): 32x compression with minimal quality loss
- Final Projections: Dimensionality reduction techniques

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
rustfmt.toml		rustfmt.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

muvera-rs

Disclaimer

Overview

The Algorithm

Features

Installation

Rust

Python

Testing the Python Bindings

Usage

Python Example

Rust Example

Batch Processing

Custom Aggregation

API Reference

`FDEEncoder<T>`

Constructor

Methods

`Aggregation`

Performance Considerations

References

Original Paper

Google Research Blog

Original Implementation

Future Work

Planned Features

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

muvera-rs

Disclaimer

Overview

The Algorithm

Features

Installation

Rust

Python

Testing the Python Bindings

Usage

Python Example

Rust Example

Batch Processing

Custom Aggregation

API Reference

FDEEncoder<T>

Constructor

Methods

Aggregation

Performance Considerations

References

Original Paper

Google Research Blog

Original Implementation

Future Work

Planned Features

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`FDEEncoder<T>`

`Aggregation`

Packages