Knowledge Base System - Semantic Search with Python, PostgreSQL & pgvector

Open-source AI knowledge base with semantic search, vector embeddings, and Claude MCP integration. Built with Python and PostgreSQL pgvector for LLM-powered document retrieval.

Features:

Semantic Search - Vector embeddings with OpenAI (text-embedding-3-small)
PostgreSQL + pgvector - Vector similarity operations and full-text search
Claude MCP Integration - Model Context Protocol server for Claude Code/Desktop
RAG Agent CLI - Interactive terminal agent with query improvement (Google Gemini)
Python Toolkit - Clean, modular API with type hints
Async Operations - Database-first writes with background file sync

System Architecture

Claude Code / Claude Desktop          Terminal (kbagent)
           │                                  │
           ▼                                  ▼
MCP Server (mcp_server.py)           RAG Agent (rag_agent.py)
  ├─ search_summaries()               ├─ Query improvement (Gemini)
  ├─ fetch_document()                 ├─ Document-only responses
  ├─ save_knowledge()                 └─ Interactive CLI
  ├─ update_document()                        │
  ├─ delete_document()                        │
  └─ list_categories()                        │
           │                                  │
           └──────────────┬───────────────────┘
                          ▼
              Knowledge Toolkit (toolkit.py)
                ├─ search_summaries_tool()
                ├─ fetch_document_tool()
                ├─ knowledge_store_tool()
                ├─ update_document_tool()
                ├─ delete_document_tool()
                └─ knowledge_list_categories()
                          │
                          ▼
              PostgreSQL + pgvector
                ├─ documents (full content)
                ├─ summaries (embeddings)
                └─ vector indexes

Quick Start Guide

1. Python Setup

git clone <repo>
cd knowledge-base
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Database Setup

Supabase (Cloud):

# 1. Create PostgreSQL instance on Supabase
# 2. Run schema.sql in SQL Editor
# 3. Copy connection string from settings

Local Docker:

docker-compose up -d
psql -h localhost -U db_user -d knowledge -f schema.sql

3. Configure MCP Server

Add to your MCP config:

{
  "mcpServers": {
    "knowledge-base": {
      "type": "stdio",
      "command": "python",
      "args": ["src/knowledge_base/mcp_server.py"],
      "env": {
        "DATABASE_URL": "${DATABASE_URL}",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "ENABLE_FILE_OPERATIONS": "${ENABLE_FILE_OPERATIONS:-true}",
        "KNOWLEDGE_DIR": "${KNOWLEDGE_DIR:-./knowledge}"
      }
    }
  }
}

Or use CLI:

claude mcp add --transport stdio knowledge-base \
  --env DATABASE_URL="postgresql://user:password@host:5432/database" \
  --env OPENAI_API_KEY="sk-..." \
  -- python src/knowledge_base/mcp_server.py

Use absolute paths in config. See .env.example for details.

Now ask Claude: "Search my knowledge base for X"

4. RAG Agent CLI (kbagent)

Interactive terminal agent with query improvement and document-based responses.

Install:

# In project directory with venv activated
pip install -e .

# For global access, add to ~/.zshrc or ~/.bashrc:
alias kbagent="/path/to/knowledge-base/.venv/bin/kbagent"

Usage:

# Interactive mode
kbagent

# Single query
kbagent "What is semantic search?"

Features:

Query improvement: Clarifies unclear questions, enhances queries for better search
Document-only responses: Answers strictly from knowledge base content
Source attribution with relevance scores
Commands: /help, /categories, /quit

Example session:

$ kbagent

Knowledge Base Agent
Type your question or /help for commands

You: python best practices
Analyzing query...

Based on the knowledge base documents, here are the key Python best practices...

Sources:
  - Python Style Guide (knowledgebase) [85%]
  - Clean Code Principles (knowledgebase) [72%]

Confidence: 78%

Requires GOOGLE_API_KEY in environment for query improvement (Gemini).

Direct Python Usage (Optional)

from knowledge_base import search_summaries_tool, knowledge_store_tool

# Search
results = search_summaries_tool("python best practices", limit=5)

# Save
response = knowledge_store_tool(
    title="New Knowledge",
    content="# Markdown content",
    category="knowledgebase"
)

Database Schema

Two tables with no data duplication:

documents: Full content, metadata, category (BIGSERIAL primary key)
summaries: Auto-generated summaries with vector embeddings (BIGSERIAL primary key, references documents with CASCADE delete)

See schema.sql for complete schema.

Python API Reference

search_summaries_tool(query, category=None, limit=5, min_relevance=None)

Returns: SummarySearchResponse with results list
Each result: document_id, title, summary, relevance_score
Optional min_relevance filter (0.0-1.0)

fetch_document_tool(document_id)

Returns: DocumentResponse with full content, metadata

knowledge_store_tool(title, content, category, tags=None, description=None)

Returns: OperationResponse with document_id
Async: File write and summary generation happen in background

update_document_tool(document_id, content)

Returns: OperationResponse with updated document metadata
Updates content only (title/category unchanged)
Async: File update and summary regeneration in background

delete_document_tool(document_id)

Returns: OperationResponse with deleted document info
Permanent operation (cannot be undone)
Async: File cleanup in background

knowledge_list_categories()

Returns: CategoriesResponse with all categories and counts

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/knowledge_base		src/knowledge_base
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Base System - Semantic Search with Python, PostgreSQL & pgvector

System Architecture

Quick Start Guide

1. Python Setup

2. Database Setup

3. Configure MCP Server

4. RAG Agent CLI (kbagent)

Direct Python Usage (Optional)

Database Schema

Python API Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Knowledge Base System - Semantic Search with Python, PostgreSQL & pgvector

System Architecture

Quick Start Guide

1. Python Setup

2. Database Setup

3. Configure MCP Server

4. RAG Agent CLI (kbagent)

Direct Python Usage (Optional)

Database Schema

Python API Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages