local-rag — Distributed Memory + Code RAG for Claude Code

Semantic memory and code intelligence as an MCP plugin for Claude Code agents. 11 tools that give Claude persistent memory, semantic code search, import graph traversal, and symbol-level navigation — all running locally.

What it does

Tool	Description
`recall(query)`	Semantic search across stored memories
`remember(content)`	Store memory with type / scope / tags / importance
`search_code(query)`	Hybrid RAG over indexed codebase (4 modes, reranker, name filter)
`get_symbol(symbol_id)`	Retrieve a symbol by UUID — direct Qdrant lookup, no file I/O
`find_usages(symbol_id)`	Find callers/references of a symbol (lexical + semantic, self-excluded)
`get_file_context(file_path)`	Read file + list indexed symbols with UUIDs for `get_symbol`/`find_usages`
`get_dependencies(file_path)`	Import graph traversal (forward / reverse / transitive)
`project_overview()`	3-level directory tree, entry points, top imports
`forget(memory_id)`	Delete a memory permanently
`consolidate()`	Merge semantically similar memories
`stats()`	Memory and index statistics

Stack

Qdrant — vector database (Rust, production-ready)
Ollama — local embeddings (embeddinggemma:300m)
tree-sitter — multi-language code parser (TypeScript, JavaScript, Go, Rust)
MCP — Model Context Protocol (stdio transport)

Prerequisites

1. Ollama (local embeddings)

Install: https://ollama.com/download

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS — download the app from:
# https://ollama.com/download/mac

# Windows — download the installer from:
# https://ollama.com/download/windows

Pull the embedding model:

ollama pull embeddinggemma:300m

2. Qdrant (vector database)

Option A — Docker Compose (recommended)

A ready-to-use docker-compose.yml is included in this repo:

docker compose up -d

Exposes port 6333 (REST) and 6334 (gRPC). Data persists in a named volume qdrant-data.

Option B — Docker run

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v qdrant-data:/qdrant/storage \
  qdrant/qdrant

Option C — Qdrant Cloud

https://cloud.qdrant.io/ — set qdrant-url in .memory.json to your cluster endpoint.

3. Node.js 18+

https://nodejs.org/

Installation

From npm (recommended):

npm install -g @13w/local-rag

From source:

git clone https://github.com/13W/local-rag.git
cd local-rag
npm install && npm run build

Claude Code Plugin Setup

Install local-rag

Option A — claude mcp add with npx (no global install needed)

Per-project (stored in .mcp.json, shared with the team):

claude mcp add memory -- npx -y @13w/local-rag serve --config .memory.json

Global — available in all projects on this machine:

claude mcp add memory -s user -- npx -y @13w/local-rag serve --config .memory.json

Option B — .mcp.json directly

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
    }
  }
}

Option C — After global npm install -g

claude mcp add memory -- local-rag serve --config .memory.json

Install Serena (recommended companion)

Serena provides filesystem access and precise symbolic code editing that complements local-rag: local-rag finds code by meaning, Serena reads and edits it surgically.

Repo: https://github.com/oraios/serena

Requirements: Python 3.10+, uv

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Register Serena as a Claude Code plugin (per-project)
claude mcp add serena -- uvx --from serena serena-mcp-server --context ide-assistant --project .

Or in .mcp.json:

{
  "mcpServers": {
    "serena": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
    }
  }
}

Combined `.mcp.json` (both plugins)

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
    },
    "serena": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
    }
  }
}

Agent workflow setup

Run init once in your project root after registering the MCP plugin. It installs hooks that enforce the recall → search → remember protocol on every session and prompt, and writes reference guides into .claude/rules/ so Claude always has the tool conventions at hand.

npx @13w/local-rag init

# If installed globally
local-rag init

Output:

wrote  .claude/hooks/session-start.sh
wrote  .claude/hooks/prompt-reminder.sh
wrote  .claude/settings.json
wrote  .claude/settings.local.json
wrote  .claude/rules/continuous-remember.md
wrote  .claude/rules/memory-protocol-reference.md
wrote  .claude/rules/serena-conventions.md

What each file does:

File	Purpose
`hooks/session-start.sh`	Injects the full protocol cheatsheet as a `system-reminder` at every session start and after context compaction
`hooks/prompt-reminder.sh`	Fires on every user prompt — reminds Claude to `recall()` before acting and `remember()` after
`rules/continuous-remember.md`	When and how to call `remember()` immediately (trigger events, format, anti-patterns)
`rules/memory-protocol-reference.md`	Full tool reference with parameter tables and call examples
`rules/serena-conventions.md`	Serena vs Memory MCP routing guide and end-to-end editing workflow
`settings.json`	Registers the hooks in Claude Code (commit this)
`settings.local.json`	Local hook overrides — add to `.gitignore`

Commit .claude/hooks/, .claude/rules/, and .claude/settings.json to share the workflow with your team.

Configuration

Create .memory.json in your project root (auto-discovered if present):

{
  "project-id": "my-project",
  "project-root": ".",
  "qdrant-url": "http://localhost:6333",
  "embed-provider": "ollama",
  "embed-model": "embeddinggemma:300m",
  "ollama-url": "http://localhost:11434"
}

Full config reference

Key	Default	Description
`project-id`	`"default"`	Isolates memories and code index per project
`project-root`	config file directory	Root path for code indexing
`qdrant-url`	`http://localhost:6333`	Qdrant REST API URL
`embed-provider`	`"ollama"`	Embedding provider: `ollama`, `openai`, `voyage`
`embed-model`	provider default¹	Embedding model name
`embed-dim`	`1024`	Embedding vector dimension
`embed-api-key`	`""`	API key for OpenAI / Voyage embed providers — falls back to `OPENAI_API_KEY` / `VOYAGE_API_KEY` env var
`embed-url`	`""`	Custom embedding API endpoint
`ollama-url`	`http://localhost:11434`	Ollama API URL
`agent-id`	`"default"`	Agent identifier (for multi-agent setups)
`llm-provider`	`"ollama"`	LLM provider: `ollama`, `anthropic`, `openai`
`llm-model`	provider default²	LLM model for reranking / description generation
`llm-api-key`	`""`	API key for Anthropic / OpenAI LLM providers — falls back to `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` env var
`llm-url`	`""`	Custom LLM API endpoint
`include-paths`	`[]`	Glob patterns to limit indexing scope (monorepos)
`generate-descriptions`	`false`	Auto-generate LLM descriptions for code chunks (slow)
`dashboard`	`true`	Enable the live dashboard HTTP server
`dashboard-port`	`0`	Dashboard HTTP port; `0` lets the OS pick a random port
`collection-prefix`	`""`	String prepended to all Qdrant collection names (useful on shared Qdrant instances)
`no-watch`	`false`	Disable automatic file re-indexing when files change (applies during `serve`)

¹ embed-model defaults: ollama → embeddinggemma:300m, openai → text-embedding-3-small, voyage → voyage-code-3

² llm-model defaults: ollama → gemma3n:e2b, anthropic → claude-haiku-4-5-20251001, openai → gpt-4o-mini

Resolution order (highest to lowest priority): CLI flag → .memory.json value → environment variable → built-in default.

API key environment variables are provider-specific:

Provider embed-api-key env var llm-api-key env var

openai OPENAI_API_KEY OPENAI_API_KEY

voyage VOYAGE_API_KEY —

anthropic — ANTHROPIC_API_KEY

All other keys can also be passed as CLI flags (e.g. --project-id foo). CLI flags override config file values. include-paths is config-file only.

search_code — search modes and reranker

search_code supports four modes via the search_mode parameter:

Mode	Description
`hybrid` (default)	3-way RRF fusion: code vector + description vector + lexical text leg
`code`	Code vector only — exact structural similarity
`semantic`	Description vector only — conceptual search when you don't know the name
`lexical`	Text index filter — only chunks where query terms literally appear in name or content

Cross-encoder reranker

After vector retrieval, an optional cross-encoder pass (Xenova/bge-reranker-base) re-scores and reorders results for higher precision:

search_code("embedOne", rerank=true, rerank_k=50, top=5)
# Fetches 50 ANN candidates, scores all 50 with the cross-encoder, returns top 5

Parameter	Default	Description
`rerank`	`false`	Enable cross-encoder reranking
`rerank_k`	`50`	ANN candidates to fetch before reranking
`top`	`limit`	Results to return after reranking

Symbol name filter

search_code("embed vector", name_pattern="embed")
# Only returns chunks whose name contains "embed" (prefix-tokenized index)

Symbol-aware workflow

Every symbol UUID surface (search_code, get_file_context) feeds directly into the two symbol tools:

# From search
search_code("parse imports typescript")
# → id:  abc-123-...  file: src/parser.ts  name: extractImports

# From file listing
get_file_context("src/parser.ts")
# → function  extractImports  (lines 248–264)  id: abc-123-...

# Read the symbol directly (no file I/O)
get_symbol("abc-123-...")

# Find all callers / references
find_usages("abc-123-...", limit=20)
# Returns [lexical] hits (literal name match) + [semantic] hits (conceptual match), self-excluded

Indexing Your Codebase

Before search_code and get_file_context tools return results, index the project:

# Index once
npx @13w/local-rag index . --config .memory.json

# Watch mode — re-indexes on file changes
npx @13w/local-rag watch . --config .memory.json

# If installed globally
local-rag index . --config .memory.json
local-rag watch . --config .memory.json

Other indexer commands:

local-rag clear --config .memory.json    # remove all indexed chunks
local-rag stats --config .memory.json    # show collection statistics
local-rag file <abs-path> <root>         # index a single file
local-rag repair . --config .memory.json # fix empty symbol names (payload-only, no re-embedding)

repair is useful after updating to a version with improved parser extraction logic: it patches only the name field for affected chunks without regenerating embeddings or descriptions.

Live Dashboard

local-rag serve automatically opens a browser dashboard on a local HTTP port. It displays real-time tool call statistics (calls, bytes, latency, errors per tool), a scrolling request log, a server info bar (project, branch, version, watch status), and an interactive tool playground for testing calls manually.

The port is OS-assigned by default (printed to stderr as [dashboard] http://localhost:PORT). To use a fixed port or disable the dashboard:

{ "dashboard-port": 4242 }
{ "dashboard": false }

Memory Types

Type	Use for	Decay
`episodic`	Events, bugs, incidents	Time-decayed
`semantic`	Facts, architecture, decisions	Long-lived
`procedural`	Patterns, conventions, how-to	Long-lived

Agent Protocol

Run local-rag init (see Agent workflow setup) to install the full RECALL → SEARCH_CODE → THINK → ACT → REMEMBER protocol into your project. The hooks fire automatically — no manual prompting required.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.claude		.claude
scripts		scripts
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CLAUDE_EXAMPLE.md		CLAUDE_EXAMPLE.md
README.md		README.md
biome.jsonc		biome.jsonc
docker-compose.yml		docker-compose.yml
logo.svg		logo.svg
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.bench.json		tsconfig.bench.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

local-rag — Distributed Memory + Code RAG for Claude Code

What it does

Stack

Prerequisites

1. Ollama (local embeddings)

2. Qdrant (vector database)

3. Node.js 18+

Installation

Claude Code Plugin Setup

Install local-rag

Install Serena (recommended companion)

Combined `.mcp.json` (both plugins)

Agent workflow setup

Configuration

Full config reference

search_code — search modes and reranker

Cross-encoder reranker

Symbol name filter

Symbol-aware workflow

Indexing Your Codebase

Live Dashboard

Memory Types

Agent Protocol

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Provider	`embed-api-key` env var	`llm-api-key` env var
`openai`	`OPENAI_API_KEY`	`OPENAI_API_KEY`
`voyage`	`VOYAGE_API_KEY`	—
`anthropic`	—	`ANTHROPIC_API_KEY`

Folders and files

Latest commit

History

Repository files navigation

local-rag — Distributed Memory + Code RAG for Claude Code

What it does

Stack

Prerequisites

1. Ollama (local embeddings)

2. Qdrant (vector database)

3. Node.js 18+

Installation

Claude Code Plugin Setup

Install local-rag

Install Serena (recommended companion)

Combined .mcp.json (both plugins)

Agent workflow setup

Configuration

Full config reference

search_code — search modes and reranker

Cross-encoder reranker

Symbol name filter

Symbol-aware workflow

Indexing Your Codebase

Live Dashboard

Memory Types

Agent Protocol

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Combined `.mcp.json` (both plugins)

Packages