Semantic memory and code intelligence as an MCP plugin for Claude Code agents. 11 tools that give Claude persistent memory, semantic code search, import graph traversal, and symbol-level navigation — all running locally.
| Tool | Description |
|---|---|
recall(query) |
Semantic search across stored memories |
remember(content) |
Store memory with type / scope / tags / importance |
search_code(query) |
Hybrid RAG over indexed codebase (4 modes, reranker, name filter) |
get_symbol(symbol_id) |
Retrieve a symbol by UUID — direct Qdrant lookup, no file I/O |
find_usages(symbol_id) |
Find callers/references of a symbol (lexical + semantic, self-excluded) |
get_file_context(file_path) |
Read file + list indexed symbols with UUIDs for get_symbol/find_usages |
get_dependencies(file_path) |
Import graph traversal (forward / reverse / transitive) |
project_overview() |
3-level directory tree, entry points, top imports |
forget(memory_id) |
Delete a memory permanently |
consolidate() |
Merge semantically similar memories |
stats() |
Memory and index statistics |
- Qdrant — vector database (Rust, production-ready)
- Ollama — local embeddings (
embeddinggemma:300m) - tree-sitter — multi-language code parser (TypeScript, JavaScript, Go, Rust)
- MCP — Model Context Protocol (stdio transport)
Install: https://ollama.com/download
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# macOS — download the app from:
# https://ollama.com/download/mac
# Windows — download the installer from:
# https://ollama.com/download/windowsPull the embedding model:
ollama pull embeddinggemma:300mOption A — Docker Compose (recommended)
A ready-to-use docker-compose.yml is included in this repo:
docker compose up -dExposes port 6333 (REST) and 6334 (gRPC). Data persists in a named volume qdrant-data.
Option B — Docker run
docker run -d --name qdrant \
-p 6333:6333 -p 6334:6334 \
-v qdrant-data:/qdrant/storage \
qdrant/qdrantOption C — Qdrant Cloud
https://cloud.qdrant.io/ — set qdrant-url in .memory.json to your cluster endpoint.
From npm (recommended):
npm install -g @13w/local-ragFrom source:
git clone https://github.com/13W/local-rag.git
cd local-rag
npm install && npm run buildOption A — claude mcp add with npx (no global install needed)
Per-project (stored in .mcp.json, shared with the team):
claude mcp add memory -- npx -y @13w/local-rag serve --config .memory.jsonGlobal — available in all projects on this machine:
claude mcp add memory -s user -- npx -y @13w/local-rag serve --config .memory.jsonOption B — .mcp.json directly
{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
}
}
}Option C — After global npm install -g
claude mcp add memory -- local-rag serve --config .memory.jsonSerena provides filesystem access and precise symbolic code editing that complements local-rag: local-rag finds code by meaning, Serena reads and edits it surgically.
Repo: https://github.com/oraios/serena
Requirements: Python 3.10+, uv
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Register Serena as a Claude Code plugin (per-project)
claude mcp add serena -- uvx --from serena serena-mcp-server --context ide-assistant --project .Or in .mcp.json:
{
"mcpServers": {
"serena": {
"type": "stdio",
"command": "uvx",
"args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
}
}
}{
"mcpServers": {
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@13w/local-rag", "serve", "--config", ".memory.json"]
},
"serena": {
"type": "stdio",
"command": "uvx",
"args": ["--from", "serena", "serena-mcp-server", "--context", "ide-assistant", "--project", "."]
}
}
}Run init once in your project root after registering the MCP plugin.
It installs hooks that enforce the recall → search → remember protocol on every session and prompt, and writes reference guides into .claude/rules/ so Claude always has the tool conventions at hand.
npx @13w/local-rag init
# If installed globally
local-rag initOutput:
wrote .claude/hooks/session-start.sh
wrote .claude/hooks/prompt-reminder.sh
wrote .claude/settings.json
wrote .claude/settings.local.json
wrote .claude/rules/continuous-remember.md
wrote .claude/rules/memory-protocol-reference.md
wrote .claude/rules/serena-conventions.md
What each file does:
| File | Purpose |
|---|---|
hooks/session-start.sh |
Injects the full protocol cheatsheet as a system-reminder at every session start and after context compaction |
hooks/prompt-reminder.sh |
Fires on every user prompt — reminds Claude to recall() before acting and remember() after |
rules/continuous-remember.md |
When and how to call remember() immediately (trigger events, format, anti-patterns) |
rules/memory-protocol-reference.md |
Full tool reference with parameter tables and call examples |
rules/serena-conventions.md |
Serena vs Memory MCP routing guide and end-to-end editing workflow |
settings.json |
Registers the hooks in Claude Code (commit this) |
settings.local.json |
Local hook overrides — add to .gitignore |
Commit .claude/hooks/, .claude/rules/, and .claude/settings.json to share the workflow with your team.
Create .memory.json in your project root (auto-discovered if present):
{
"project-id": "my-project",
"project-root": ".",
"qdrant-url": "http://localhost:6333",
"embed-provider": "ollama",
"embed-model": "embeddinggemma:300m",
"ollama-url": "http://localhost:11434"
}| Key | Default | Description |
|---|---|---|
project-id |
"default" |
Isolates memories and code index per project |
project-root |
config file directory | Root path for code indexing |
qdrant-url |
http://localhost:6333 |
Qdrant REST API URL |
embed-provider |
"ollama" |
Embedding provider: ollama, openai, voyage |
embed-model |
provider default¹ | Embedding model name |
embed-dim |
1024 |
Embedding vector dimension |
embed-api-key |
"" |
API key for OpenAI / Voyage embed providers — falls back to OPENAI_API_KEY / VOYAGE_API_KEY env var |
embed-url |
"" |
Custom embedding API endpoint |
ollama-url |
http://localhost:11434 |
Ollama API URL |
agent-id |
"default" |
Agent identifier (for multi-agent setups) |
llm-provider |
"ollama" |
LLM provider: ollama, anthropic, openai |
llm-model |
provider default² | LLM model for reranking / description generation |
llm-api-key |
"" |
API key for Anthropic / OpenAI LLM providers — falls back to ANTHROPIC_API_KEY / OPENAI_API_KEY env var |
llm-url |
"" |
Custom LLM API endpoint |
include-paths |
[] |
Glob patterns to limit indexing scope (monorepos) |
generate-descriptions |
false |
Auto-generate LLM descriptions for code chunks (slow) |
dashboard |
true |
Enable the live dashboard HTTP server |
dashboard-port |
0 |
Dashboard HTTP port; 0 lets the OS pick a random port |
collection-prefix |
"" |
String prepended to all Qdrant collection names (useful on shared Qdrant instances) |
no-watch |
false |
Disable automatic file re-indexing when files change (applies during serve) |
¹
embed-modeldefaults:ollama→embeddinggemma:300m,openai→text-embedding-3-small,voyage→voyage-code-3²
llm-modeldefaults:ollama→gemma3n:e2b,anthropic→claude-haiku-4-5-20251001,openai→gpt-4o-miniResolution order (highest to lowest priority): CLI flag →
.memory.jsonvalue → environment variable → built-in default.API key environment variables are provider-specific:
Provider embed-api-keyenv varllm-api-keyenv varopenaiOPENAI_API_KEYOPENAI_API_KEYvoyageVOYAGE_API_KEY— anthropic— ANTHROPIC_API_KEYAll other keys can also be passed as CLI flags (e.g.
--project-id foo). CLI flags override config file values.include-pathsis config-file only.
search_code supports four modes via the search_mode parameter:
| Mode | Description |
|---|---|
hybrid (default) |
3-way RRF fusion: code vector + description vector + lexical text leg |
code |
Code vector only — exact structural similarity |
semantic |
Description vector only — conceptual search when you don't know the name |
lexical |
Text index filter — only chunks where query terms literally appear in name or content |
After vector retrieval, an optional cross-encoder pass (Xenova/bge-reranker-base) re-scores and reorders results for higher precision:
search_code("embedOne", rerank=true, rerank_k=50, top=5)
# Fetches 50 ANN candidates, scores all 50 with the cross-encoder, returns top 5
| Parameter | Default | Description |
|---|---|---|
rerank |
false |
Enable cross-encoder reranking |
rerank_k |
50 |
ANN candidates to fetch before reranking |
top |
limit |
Results to return after reranking |
search_code("embed vector", name_pattern="embed")
# Only returns chunks whose name contains "embed" (prefix-tokenized index)
Every symbol UUID surface (search_code, get_file_context) feeds directly into the two symbol tools:
# From search
search_code("parse imports typescript")
# → id: abc-123-... file: src/parser.ts name: extractImports
# From file listing
get_file_context("src/parser.ts")
# → function extractImports (lines 248–264) id: abc-123-...
# Read the symbol directly (no file I/O)
get_symbol("abc-123-...")
# Find all callers / references
find_usages("abc-123-...", limit=20)
# Returns [lexical] hits (literal name match) + [semantic] hits (conceptual match), self-excluded
Before search_code and get_file_context tools return results, index the project:
# Index once
npx @13w/local-rag index . --config .memory.json
# Watch mode — re-indexes on file changes
npx @13w/local-rag watch . --config .memory.json
# If installed globally
local-rag index . --config .memory.json
local-rag watch . --config .memory.jsonOther indexer commands:
local-rag clear --config .memory.json # remove all indexed chunks
local-rag stats --config .memory.json # show collection statistics
local-rag file <abs-path> <root> # index a single file
local-rag repair . --config .memory.json # fix empty symbol names (payload-only, no re-embedding)repair is useful after updating to a version with improved parser extraction logic: it patches only the name field for affected chunks without regenerating embeddings or descriptions.
local-rag serve automatically opens a browser dashboard on a local HTTP port.
It displays real-time tool call statistics (calls, bytes, latency, errors per tool),
a scrolling request log, a server info bar (project, branch, version, watch status),
and an interactive tool playground for testing calls manually.
The port is OS-assigned by default (printed to stderr as [dashboard] http://localhost:PORT).
To use a fixed port or disable the dashboard:
{ "dashboard-port": 4242 }
{ "dashboard": false }| Type | Use for | Decay |
|---|---|---|
episodic |
Events, bugs, incidents | Time-decayed |
semantic |
Facts, architecture, decisions | Long-lived |
procedural |
Patterns, conventions, how-to | Long-lived |
Run local-rag init (see Agent workflow setup) to install the full
RECALL → SEARCH_CODE → THINK → ACT → REMEMBER protocol into your project.
The hooks fire automatically — no manual prompting required.