Skip to content

sltnsrh/knowledge-base

Repository files navigation

Knowledge Base System - Semantic Search with Python, PostgreSQL & pgvector

Python PostgreSQL License

Open-source AI knowledge base with semantic search, vector embeddings, and Claude MCP integration. Built with Python and PostgreSQL pgvector for LLM-powered document retrieval.

Features:

  • Semantic Search - Vector embeddings with OpenAI (text-embedding-3-small)
  • PostgreSQL + pgvector - Vector similarity operations and full-text search
  • Claude MCP Integration - Model Context Protocol server for Claude Code/Desktop
  • RAG Agent CLI - Interactive terminal agent with query improvement (Google Gemini)
  • Python Toolkit - Clean, modular API with type hints
  • Async Operations - Database-first writes with background file sync

System Architecture

Claude Code / Claude Desktop          Terminal (kbagent)
           │                                  │
           ▼                                  ▼
MCP Server (mcp_server.py)           RAG Agent (rag_agent.py)
  ├─ search_summaries()               ├─ Query improvement (Gemini)
  ├─ fetch_document()                 ├─ Document-only responses
  ├─ save_knowledge()                 └─ Interactive CLI
  ├─ update_document()                        │
  ├─ delete_document()                        │
  └─ list_categories()                        │
           │                                  │
           └──────────────┬───────────────────┘
                          ▼
              Knowledge Toolkit (toolkit.py)
                ├─ search_summaries_tool()
                ├─ fetch_document_tool()
                ├─ knowledge_store_tool()
                ├─ update_document_tool()
                ├─ delete_document_tool()
                └─ knowledge_list_categories()
                          │
                          ▼
              PostgreSQL + pgvector
                ├─ documents (full content)
                ├─ summaries (embeddings)
                └─ vector indexes

Quick Start Guide

1. Python Setup

git clone <repo>
cd knowledge-base
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Database Setup

Supabase (Cloud):

# 1. Create PostgreSQL instance on Supabase
# 2. Run schema.sql in SQL Editor
# 3. Copy connection string from settings

Local Docker:

docker-compose up -d
psql -h localhost -U db_user -d knowledge -f schema.sql

3. Configure MCP Server

Add to your MCP config:

{
  "mcpServers": {
    "knowledge-base": {
      "type": "stdio",
      "command": "python",
      "args": ["src/knowledge_base/mcp_server.py"],
      "env": {
        "DATABASE_URL": "${DATABASE_URL}",
        "OPENAI_API_KEY": "${OPENAI_API_KEY}",
        "ENABLE_FILE_OPERATIONS": "${ENABLE_FILE_OPERATIONS:-true}",
        "KNOWLEDGE_DIR": "${KNOWLEDGE_DIR:-./knowledge}"
      }
    }
  }
}

Or use CLI:

claude mcp add --transport stdio knowledge-base \
  --env DATABASE_URL="postgresql://user:password@host:5432/database" \
  --env OPENAI_API_KEY="sk-..." \
  -- python src/knowledge_base/mcp_server.py

Use absolute paths in config. See .env.example for details.

Now ask Claude: "Search my knowledge base for X"

4. RAG Agent CLI (kbagent)

Interactive terminal agent with query improvement and document-based responses.

Install:

# In project directory with venv activated
pip install -e .

# For global access, add to ~/.zshrc or ~/.bashrc:
alias kbagent="/path/to/knowledge-base/.venv/bin/kbagent"

Usage:

# Interactive mode
kbagent

# Single query
kbagent "What is semantic search?"

Features:

  • Query improvement: Clarifies unclear questions, enhances queries for better search
  • Document-only responses: Answers strictly from knowledge base content
  • Source attribution with relevance scores
  • Commands: /help, /categories, /quit

Example session:

$ kbagent

Knowledge Base Agent
Type your question or /help for commands

You: python best practices
Analyzing query...

Based on the knowledge base documents, here are the key Python best practices...

Sources:
  - Python Style Guide (knowledgebase) [85%]
  - Clean Code Principles (knowledgebase) [72%]

Confidence: 78%

Requires GOOGLE_API_KEY in environment for query improvement (Gemini).

Direct Python Usage (Optional)

from knowledge_base import search_summaries_tool, knowledge_store_tool

# Search
results = search_summaries_tool("python best practices", limit=5)

# Save
response = knowledge_store_tool(
    title="New Knowledge",
    content="# Markdown content",
    category="knowledgebase"
)

Database Schema

Two tables with no data duplication:

  • documents: Full content, metadata, category (BIGSERIAL primary key)
  • summaries: Auto-generated summaries with vector embeddings (BIGSERIAL primary key, references documents with CASCADE delete)

See schema.sql for complete schema.


Python API Reference

search_summaries_tool(query, category=None, limit=5, min_relevance=None)

  • Returns: SummarySearchResponse with results list
  • Each result: document_id, title, summary, relevance_score
  • Optional min_relevance filter (0.0-1.0)

fetch_document_tool(document_id)

  • Returns: DocumentResponse with full content, metadata

knowledge_store_tool(title, content, category, tags=None, description=None)

  • Returns: OperationResponse with document_id
  • Async: File write and summary generation happen in background

update_document_tool(document_id, content)

  • Returns: OperationResponse with updated document metadata
  • Updates content only (title/category unchanged)
  • Async: File update and summary regeneration in background

delete_document_tool(document_id)

  • Returns: OperationResponse with deleted document info
  • Permanent operation (cannot be undone)
  • Async: File cleanup in background

knowledge_list_categories()

  • Returns: CategoriesResponse with all categories and counts

About

Semantic search knowledge base with vector embeddings and Claude MCP integration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages