Skip to content

13W/local-rag

Repository files navigation

local-rag logo

local-rag — Distributed Memory + Code RAG for Claude Code

npm GitHub

Semantic memory and code intelligence as an MCP plugin for Claude Code agents. 11 tools that give Claude persistent memory, semantic code search, import graph traversal, and symbol-level navigation — all running locally.

What it does

Tool Description
recall(query) Semantic search across stored memories
remember(content) Store memory with type / scope / tags / importance
search_code(query) Hybrid RAG over indexed codebase (4 modes, reranker, name filter)
get_symbol(symbol_id) Retrieve a symbol by UUID — direct Qdrant lookup, no file I/O
find_usages(symbol_id) Find callers/references of a symbol (lexical + semantic, self-excluded)
get_file_context(file_path) Read file + list indexed symbols with UUIDs for get_symbol/find_usages
get_dependencies(file_path) Import graph traversal (forward / reverse / transitive)
project_overview() 3-level directory tree, entry points, top imports
forget(memory_id) Delete a memory permanently
consolidate() Merge semantically similar memories
stats() Memory and index statistics

Stack

  • Qdrant — vector database (Rust, production-ready)
  • Ollama — local embeddings (embeddinggemma:300m)
  • tree-sitter — multi-language code parser (TypeScript, JavaScript, Go, Rust)
  • MCP — Model Context Protocol (stdio transport)

Prerequisites

1. Ollama (local embeddings)

Install: https://ollama.com/download

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# macOS — download the app from:
# https://ollama.com/download/mac

# Windows — download the installer from:
# https://ollama.com/download/windows

Pull the embedding model:

ollama pull embeddinggemma:300m

2. Qdrant (vector database)

Option A — Docker Compose (recommended)

A ready-to-use docker-compose.yml is included in this repo:

docker compose up -d

Exposes port 6333 (REST) and 6334 (gRPC). Data persists in a named volume qdrant-data.

Option B — Docker run

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v qdrant-data:/qdrant/storage \
  qdrant/qdrant

Option C — Qdrant Cloud

https://cloud.qdrant.io/ — set qdrant-url in .memory.json to your cluster endpoint.

3. Node.js 18+

https://nodejs.org/


Installation

From npm (recommended):

npm install -g @13w/local-rag

From source:

git clone https://github.com/13W/local-rag.git
cd local-rag
npm install && npm run build

Claude Code Plugin Setup

Install local-rag

Option A — claude mcp add with npx (no global install needed)

Per-project (stored in .mcp.json, shared with the team):

claude mcp add memory -- npx -y @13w/local-rag serve --config .memory.json

Global — available in all projects on this machine:

claude mcp add memory -s user -- npx -y @13w/local-rag serve --config .memory.json

Option B — .mcp.json directly

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
    }
  }
}

Option C — After global npm install -g

claude mcp add memory -- local-rag serve --config .memory.json

Install Serena (recommended companion)

Serena provides filesystem access and precise symbolic code editing that complements local-rag: local-rag finds code by meaning, Serena reads and edits it surgically.

Repo: https://github.com/oraios/serena

Requirements: Python 3.10+, uv

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Register Serena as a Claude Code plugin (per-project)
claude mcp add serena -- uvx --from serena serena-mcp-server --context ide-assistant --project .

Or in .mcp.json:

{
  "mcpServers": {
    "serena": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
    }
  }
}

Combined .mcp.json (both plugins)

{
  "mcpServers": {
    "memory": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
    },
    "serena": {
      "type": "stdio",
      "command": "uvx",
      "args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
    }
  }
}

Agent workflow setup

Run init once in your project root after registering the MCP plugin. It installs hooks that enforce the recall → search → remember protocol on every session and prompt, and writes reference guides into .claude/rules/ so Claude always has the tool conventions at hand.

npx @13w/local-rag init

# If installed globally
local-rag init

Output:

wrote  .claude/hooks/session-start.sh
wrote  .claude/hooks/prompt-reminder.sh
wrote  .claude/settings.json
wrote  .claude/settings.local.json
wrote  .claude/rules/continuous-remember.md
wrote  .claude/rules/memory-protocol-reference.md
wrote  .claude/rules/serena-conventions.md

What each file does:

File Purpose
hooks/session-start.sh Injects the full protocol cheatsheet as a system-reminder at every session start and after context compaction
hooks/prompt-reminder.sh Fires on every user prompt — reminds Claude to recall() before acting and remember() after
rules/continuous-remember.md When and how to call remember() immediately (trigger events, format, anti-patterns)
rules/memory-protocol-reference.md Full tool reference with parameter tables and call examples
rules/serena-conventions.md Serena vs Memory MCP routing guide and end-to-end editing workflow
settings.json Registers the hooks in Claude Code (commit this)
settings.local.json Local hook overrides — add to .gitignore

Commit .claude/hooks/, .claude/rules/, and .claude/settings.json to share the workflow with your team.


Configuration

Create .memory.json in your project root (auto-discovered if present):

{
  "project-id": "my-project",
  "project-root": ".",
  "qdrant-url": "http://localhost:6333",
  "embed-provider": "ollama",
  "embed-model": "embeddinggemma:300m",
  "ollama-url": "http://localhost:11434"
}

Full config reference

Key Default Description
project-id "default" Isolates memories and code index per project
project-root config file directory Root path for code indexing
qdrant-url http://localhost:6333 Qdrant REST API URL
embed-provider "ollama" Embedding provider: ollama, openai, voyage
embed-model provider default¹ Embedding model name
embed-dim 1024 Embedding vector dimension
embed-api-key "" API key for OpenAI / Voyage embed providers — falls back to OPENAI_API_KEY / VOYAGE_API_KEY env var
embed-url "" Custom embedding API endpoint
ollama-url http://localhost:11434 Ollama API URL
agent-id "default" Agent identifier (for multi-agent setups)
llm-provider "ollama" LLM provider: ollama, anthropic, openai
llm-model provider default² LLM model for reranking / description generation
llm-api-key "" API key for Anthropic / OpenAI LLM providers — falls back to ANTHROPIC_API_KEY / OPENAI_API_KEY env var
llm-url "" Custom LLM API endpoint
include-paths [] Glob patterns to limit indexing scope (monorepos)
generate-descriptions false Auto-generate LLM descriptions for code chunks (slow)
dashboard true Enable the live dashboard HTTP server
dashboard-port 0 Dashboard HTTP port; 0 lets the OS pick a random port
collection-prefix "" String prepended to all Qdrant collection names (useful on shared Qdrant instances)
no-watch false Disable automatic file re-indexing when files change (applies during serve)

¹ embed-model defaults: ollamaembeddinggemma:300m, openaitext-embedding-3-small, voyagevoyage-code-3

² llm-model defaults: ollamagemma3n:e2b, anthropicclaude-haiku-4-5-20251001, openaigpt-4o-mini

Resolution order (highest to lowest priority): CLI flag → .memory.json value → environment variable → built-in default.

API key environment variables are provider-specific:

Provider embed-api-key env var llm-api-key env var
openai OPENAI_API_KEY OPENAI_API_KEY
voyage VOYAGE_API_KEY
anthropic ANTHROPIC_API_KEY

All other keys can also be passed as CLI flags (e.g. --project-id foo). CLI flags override config file values. include-paths is config-file only.


search_code — search modes and reranker

search_code supports four modes via the search_mode parameter:

Mode Description
hybrid (default) 3-way RRF fusion: code vector + description vector + lexical text leg
code Code vector only — exact structural similarity
semantic Description vector only — conceptual search when you don't know the name
lexical Text index filter — only chunks where query terms literally appear in name or content

Cross-encoder reranker

After vector retrieval, an optional cross-encoder pass (Xenova/bge-reranker-base) re-scores and reorders results for higher precision:

search_code("embedOne", rerank=true, rerank_k=50, top=5)
# Fetches 50 ANN candidates, scores all 50 with the cross-encoder, returns top 5
Parameter Default Description
rerank false Enable cross-encoder reranking
rerank_k 50 ANN candidates to fetch before reranking
top limit Results to return after reranking

Symbol name filter

search_code("embed vector", name_pattern="embed")
# Only returns chunks whose name contains "embed" (prefix-tokenized index)

Symbol-aware workflow

Every symbol UUID surface (search_code, get_file_context) feeds directly into the two symbol tools:

# From search
search_code("parse imports typescript")
# → id:  abc-123-...  file: src/parser.ts  name: extractImports

# From file listing
get_file_context("src/parser.ts")
# → function  extractImports  (lines 248–264)  id: abc-123-...

# Read the symbol directly (no file I/O)
get_symbol("abc-123-...")

# Find all callers / references
find_usages("abc-123-...", limit=20)
# Returns [lexical] hits (literal name match) + [semantic] hits (conceptual match), self-excluded

Indexing Your Codebase

Before search_code and get_file_context tools return results, index the project:

# Index once
npx @13w/local-rag index . --config .memory.json

# Watch mode — re-indexes on file changes
npx @13w/local-rag watch . --config .memory.json

# If installed globally
local-rag index . --config .memory.json
local-rag watch . --config .memory.json

Other indexer commands:

local-rag clear --config .memory.json    # remove all indexed chunks
local-rag stats --config .memory.json    # show collection statistics
local-rag file <abs-path> <root>         # index a single file
local-rag repair . --config .memory.json # fix empty symbol names (payload-only, no re-embedding)

repair is useful after updating to a version with improved parser extraction logic: it patches only the name field for affected chunks without regenerating embeddings or descriptions.


Live Dashboard

local-rag serve automatically opens a browser dashboard on a local HTTP port. It displays real-time tool call statistics (calls, bytes, latency, errors per tool), a scrolling request log, a server info bar (project, branch, version, watch status), and an interactive tool playground for testing calls manually.

The port is OS-assigned by default (printed to stderr as [dashboard] http://localhost:PORT). To use a fixed port or disable the dashboard:

{ "dashboard-port": 4242 }
{ "dashboard": false }

Memory Types

Type Use for Decay
episodic Events, bugs, incidents Time-decayed
semantic Facts, architecture, decisions Long-lived
procedural Patterns, conventions, how-to Long-lived

Agent Protocol

Run local-rag init (see Agent workflow setup) to install the full RECALL → SEARCH_CODE → THINK → ACT → REMEMBER protocol into your project. The hooks fire automatically — no manual prompting required.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors