VecPuff

VecPuff is a vector database built on top of S3, inspired by turbopuffer. It's designed for learning and experimentation with scalable vector search architectures.

I have written about the thought process and journey in this blog post.

Architecture

VecPuff uses a log-structured architecture:

WAL (Write-Ahead Log): All writes are batched and written to S3 as WAL files
Compaction: Background process merges WAL files into indexed segments
ANN Index: SPFresh-based approximate nearest neighbor index for fast search
Local Cache: Frequently accessed files cached locally for performance

Building

# Build release binary
cargo build --release --bin server

# Run server
./target/release/server

The server will start on http://localhost:3000 by default.

Configuration

VecPuff uses a config.toml file for configuration. See config.toml for all available options.

Environment Variables

S3_ENDPOINT: S3 endpoint URL (default: https://t3.storage.dev)
S3_REGION: AWS region (default: sin or from config)
LOCAL_CACHE_PATH: Local cache directory (default: ./data/cache)
RUST_LOG: Logging level (e.g., info, debug)

Key Configuration Sections

[server]: HTTP server settings (timeouts, body size limits)
[limits]: Resource limits (max dimensions, document size, etc.)
[batching]: WAL batching configuration
[indexing]: Compaction triggers and thresholds
[storage]: S3 connection settings
[compactor]: Background compaction settings

API Reference

Health Check

GET /health

Returns OK if the server is running.

List Namespaces

GET /namespaces

Returns a list of all namespaces.

Response:

{
  "namespaces": [{ "id": "my-namespace" }]
}

Upsert Vectors

POST /namespaces/{namespace}

Insert or update vectors in a namespace.

Request Body:

{
  "distance_metric": "cosine_distance",  // optional: "cosine_distance" or "euclidean_squared"
  "upsert_rows": [
    {
      "id": "doc1",
      "vector": [0.1, 0.2, 0.3, ...],
      "text": "Sample document",
      "category": "tech",
      "score": 42
    }
  ],
  "patch_rows": [...],  // optional: partial updates
  "deletes": ["doc-id"]  // optional: delete by ID
}

Response:

{
  "upserted_count": 1
}

Row Format:

id: Unique document identifier (string)
vector: Vector array of floats
Additional fields: Any JSON-serializable metadata

Query Vectors

POST /namespaces/{namespace}/query

Search for similar vectors.

Request Body:

{
  "query_vector": [0.1, 0.2, 0.3, ...],
  "top_k": 10
}

Response:

{
  "results": [
    {
      "id": "doc1",
      "vector": [0.1, 0.2, 0.3, ...],
      "text": "Sample document",
      "category": "tech"
    }
  ],
  "total_count": 1
}

Get Metadata

POST /namespaces/{namespace}/metadata

Get namespace metadata including index status and row counts.

Response:

{
  "approx_row_count": 1000,
  "index": {
    "status": "up_to_date",
    "indexed_row_count": 1000,
    "ann_index_file": "index/ann_index.bin"
  },
  "created_at": 1234567890,
  "updated_at": 1234567890
}

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.cargo		.cargo
.config		.config
.github		.github
crates		crates
vecpuff-bench		vecpuff-bench
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
_typos.toml		_typos.toml
clippy.toml		clippy.toml
config.toml		config.toml
deny.toml		deny.toml
readme.md		readme.md
run_benchmarks.sh		run_benchmarks.sh
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VecPuff

Architecture

Building

Configuration

Environment Variables

Key Configuration Sections

API Reference

Health Check

List Namespaces

Upsert Vectors

Query Vectors

Get Metadata

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

VecPuff

Architecture

Building

Configuration

Environment Variables

Key Configuration Sections

API Reference

Health Check

List Namespaces

Upsert Vectors

Query Vectors

Get Metadata

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages