A pure Python implementation of FAISS (Facebook AI Similarity Search) optimized for Apple Silicon using MLX for Metal acceleration.
- Pure Python Implementation: No C++ dependencies, easy to install and modify
- Metal Acceleration: Optimized for Apple Silicon using MLX framework
- Competitive Performance: 20x speedup in specialized cases, sub-millisecond operations
- FAISS Compatible: Similar API to original Faiss library
- Lazy Evaluation: Efficient computation graphs with MLX
- MLX-only: Requires MLX on Apple Silicon (Metal). No fallbacks.
- Python 3.8+
- MLX (required)
git clone https://github.com/SolaceHarmony/MetalFaiss.git
cd MetalFaiss/python
pip install -e .pip install mlximport metalfaiss
# Create vectors (embeddings)
embeddings = [
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9],
[1.0, 1.1, 1.2],
[1.3, 1.4, 1.5]
]
# Create index and add vectors
d = len(embeddings[0]) # dimension
index = metalfaiss.FlatIndex(d, metalfaiss.MetricType.L2)
index.add(embeddings)
# Search for similar vectors
query = [[0.1, 0.5, 0.9]]
k = 3 # number of nearest neighbors
result = index.search(query, k)
print(f"Distances: {result.distances}")
print(f"Labels: {result.labels}")Run the included examples:
cd python
python basic_usage.py
python advanced_usage.py- L2 (Euclidean):
MetricType.L2 - L1 (Manhattan):
MetricType.L1 - Lβ (Chebyshev):
MetricType.LINF - Inner Product:
MetricType.INNER_PRODUCT
# Create different index types
index_l2 = metalfaiss.FlatIndex(d, metalfaiss.MetricType.L2)
index_ip = metalfaiss.FlatIndex(d, metalfaiss.MetricType.INNER_PRODUCT)
# Add vectors to index
index.add(vectors)
# Search for k nearest neighbors
result = index.search(query_vectors, k=5)
# Range search (find all vectors within distance threshold)
range_result = index.range_search(query_vectors, radius=0.5)
# Reconstruct stored vectors
reconstructed = index.reconstruct(vector_id)MetalFAISS provides excellent performance characteristics on Apple Silicon:
- Metal Acceleration: Leverages Apple's Metal Performance Shaders via MLX
- Competitive Speed: Matches or exceeds traditional FAISS in specialized cases
- Lazy Evaluation: Only computes what's needed when it's needed
- Memory Efficient: Optimized memory usage patterns
- Parallel Processing: Automatic parallelization on supported hardware
| Metric | MetalFAISS | Industry Standard | Notes |
|---|---|---|---|
| IVF Search | 1.5ms | Faiss cuVS: 0.39ms (H100) | Specialized batched case |
| QR Projection | 0.38ms | Faiss SIMD: ~0.1-0.3ms | Competitive on consumer HW |
| Pure Python | Yes | Faiss: C++ required | Zero compilation needed |
View Detailed Benchmarks - Complete performance analysis with competitive comparisons
Competitive Analysis - Industry positioning and trade-off analysis
Micro-benchmarks on Apple Silicon (MLX, float32). Numbers are median wall-clock and vary by device/driver; treat as indicative. Reproduce with the commands below. Full charts and raw CSVs live in docs/benchmarks.
GEMM (A@V)
| Shape (mΓn, k) | MLX matmul | MLX kernels | Torch (MPS) |
|---|---|---|---|
| 256Γ128, 16 | ~0.2 ms | ~0.2 ms | ~0.2 ms |
| 512Γ256, 32 | ~0.2 ms | ~0.2 ms | ~0.2 ms |
| 1024Γ512, 64 | ~0.2 ms | ~0.2 ms | ~0.2 ms |
Zβstep Aα΅(AΒ·V)
| Shape (mΓn, k) | MLX (mx.matmul) | MLX (kernels) | Torch (MPS) |
|---|---|---|---|
| 256Γ128, 16 | ~0.2 ms | ~0.3 ms | ~0.3 ms |
| 512Γ256, 32 | ~0.2 ms | ~0.3 ms | ~0.3 ms |
IVFFlat query (d=64, N=32k, nlist=128, Q=16)
| nprobe | k | Baseline MLX | Fused + device merge | Fused batched (same X) |
|---|---|---|---|---|
| 1 | 10 | 14.3 ms | 13.7 ms | 0.9 ms |
| 8 | 10 | 126.6 ms | 140.7 ms | 7.6 ms |
| 1 | 32 | 19.8 ms | 19.9 ms | 1.1 ms |
| 8 | 32 | 136.5 ms | 171.5 ms | 8.3 ms |
Run locally
- GEMM sweep:
METALFAISS_USE_GEMM_KERNEL=1 python -m python.metalfaiss.unittest.test_kernel_autotune_bench - IVF perf:
METALFAISS_USE_IVF_TOPK=1 python -m python.metalfaiss.unittest.test_ivf_benchmarks - PyTorch vs MLX (GEMM):
METALFAISS_USE_GEMM_KERNEL=1 python -m python.metalfaiss.unittest.test_torch_vs_mlx_bench - Generate CSVs + charts:
PYTHONPATH=python python python/metalfaiss/benchmarks/run_benchmarks.py - Validate doc tables vs CSVs:
PYTHONPATH=python python docs/benchmarks/validate_results.py - Provenance (device, commit, versions): see
docs/benchmarks/bench_meta.json
Details on GEMM flags and tuning in docs/mlx/GEMM-Kernels.md.
MLX (Machine Learning for Apple silicon) provides:
- Metal Performance Shaders: GPU acceleration on Apple Silicon
- Lazy Evaluation: Build computation graphs, execute efficiently
- Unified Memory: Efficient memory management between CPU/GPU
- Apple Silicon Optimization: Native performance on M1/M2/M3 chips
See docs/mlx/Kernel-Guide.md for working mx.fast.metal_kernel patterns (bodyβonly + header), grid/threadgroup sizing, and autoswitching strategies. See docs/mlx/Orthogonality.md for nonβsquare orthonormalization.
Fast GEMM (A@V and Aα΅@B) quick start
- Enable kernels:
METALFAISS_USE_GEMM_KERNEL=1 - Optional:
METALFAISS_GEMM_TILE_SQ=16 METALFAISS_GEMM_V4=1 METALFAISS_GEMM_PAD_ATB=1 - Rebuild after toggles change:
from metalfaiss.faissmlx.kernels import gemm_kernels as gk; gk.reset_gemm_kernels() - Validate:
python -m python.metalfaiss.unittest.test_gemm_flags_correctness - Bench:
python -m python.metalfaiss.unittest.test_kernel_autotune_bench
More: docs/mlx/GEMM-Kernels.md.
Attribution: Some kernel patterns and HPC techniques are adapted from the Ember ML project by Sydney Bach (The Solace Project). Weβve encoded those realβworld lessons here so others can build reliable MLX+Metal kernels.
See the python/ directory for complete examples:
basic_usage.py: Basic usage patternsadvanced_usage.py: Complex scenarios and optimizations
FlatIndex: Flat (exhaustive search) indexMetricType: Distance metric enumerationSearchResult: K-NN search resultsSearchRangeResult: Range search results
load_data(): Load vectors from filecreate_matrix(): Create random matricesnormalize_data(): Normalize vectors to unit length
Licensed under the Apache License, Version 2.0. See LICENSE.md for details.
This implementation was created by Sydney Bach for The Solace Project. Some design patterns were inspired by prior Swift work on FAISS, but this repository contains only the Python + MLX implementation.
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.
- Python 3.8+
- MLX: Apple's machine learning framework
pip install mlx
-
Clone the repository:
git clone https://github.com/SolaceHarmony/MetalFaiss.git cd MetalFaiss -
Install the Python package:
cd python pip install -e .
-
Verify installation:
import metalfaiss print(f"Metal FAISS version: {metalfaiss.__version__}")
pip install mlx # Dependency
git clone https://github.com/SolaceHarmony/MetalFaiss.git
cd MetalFaiss/python && pip install -e .cd python
python -m unittest discover metalfaiss.unittest -vgit clone https://github.com/SolaceHarmony/MetalFaiss.git
cd MetalFaiss/python
pip install -e . # Editable installMetalFaiss/
βββ python/ # Python Metal FAISS implementation
β βββ metalfaiss/ # Main package
β β βββ __init__.py # Package initialization
β β βββ indexflat.py # Flat index implementation
β β βββ metric_type.py # Distance metrics
β β βββ ...
β βββ basic_usage.py # Basic usage example
β βββ advanced_usage.py # Advanced scenarios and optimizations
β βββ setup.py # Package setup
βββ README.md # This file
FlatIndex: Exact similarity search using brute forceIVFIndex: Inverted file index for faster approximate searchMetricType: Distance metrics (L2, InnerProduct, L1, Linf)VectorTransform: Data preprocessing (PCA, normalization, etc.)
import metalfaiss
import mlx.core as mx
# Create index
index = metalfaiss.FlatIndex(d=128, metric_type=metalfaiss.MetricType.L2)
# Add vectors (MLX arrays)
vectors = mx.random.normal(shape=(1000, 128)).astype(mx.float32)
index.add(vectors)
# Search
query = mx.random.normal(shape=(1, 128)).astype(mx.float32)
result = index.search(query, k=5)
print(f"Distances: {result.distances}")
print(f"Indices: {result.labels}")# Apply PCA transform (when available)
try:
transform = metalfaiss.PCAMatrixTransform(d_in=128, d_out=64)
transform.train(training_data)
transformed_data = transform.apply(data)
except AttributeError:
print("PCA transform not yet implemented")MetalFaiss is MLXβonly and targets Apple Silicon with Metal acceleration. No NumPy fallback is provided.
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 style guidelines
- Add tests for new functionality
- Update documentation as needed
- Ensure MLX compatibility
Thanks to all contributors who have helped build Metal FAISS:
- Sydney Renee - Core Python implementation and MLX integration
Want to contribute? Check out our Contributing Guide!
- FAISS Documentation - Original FAISS library
- MLX Documentation - Apple's MLX framework
- FAISS: The Missing Manual - Comprehensive FAISS guide
- Implementation Status - Current feature completeness
Licensed under the Apache License, Version 2.0 β see LICENSE.md.
-
Facebook Research - Original FAISS library and research
-
Apple MLX Team - MLX framework enabling Metal acceleration
-
FAISS Community - For the foundational algorithms and research
β Star this repo if MetalFaiss helped you! β
π Report Bug β’ β¨ Request Feature β’ π¬ Discussions
Made with β€οΈ by The Solace Project dev team
idx = metalfaiss.index_factory(128, "Flat") # Flat exact search idx = metalfaiss.index_factory(128, "IVF100,Flat") # IVF (coarse k-means) + Flat idx = metalfaiss.index_factory(128, "HNSW32") # HNSW graph index idx = metalfaiss.index_factory(128, "PQ8") # Product Quantizer index idx = metalfaiss.index_factory(128, "SQ8") # Scalar Quantizer index idx = metalfaiss.index_factory(128, "IDMap,Flat") # External ID mapping wrapper idx = metalfaiss.index_factory(128, "IDMap2,Flat") # Two-way ID map wrapper idx = metalfaiss.index_factory(128, "PCA32,Flat") # Preprocessing + base index idx = metalfaiss.index_factory(128, "RFlat") # Refine(Flat) alias
key = metalfaiss.reverse_factory(idx) print(key) # e.g., "IVF100,Flat"
