A monorepo containing a NestJS backend and Next.js frontend for chatting with a local Llama 3.2 model via Ollama, with RAG (Retrieval-Augmented Generation) capabilities for context-aware responses.
- 🤖 Local LLM - Chat with Llama 3.2 via Ollama (no API keys needed)
- 🔍 RAG Integration - Context-aware responses using your own data
- 🗄️ PostgreSQL - Document storage and management
- 🎯 Qdrant - Vector database for semantic search
- ⏰ Auto-sync - Hourly cron job syncs documents to vector DB
- 🐳 Docker Ready - One command to run everything
The easiest way to run the entire application is with Docker. This will automatically set up Ollama, PostgreSQL, Qdrant, pull the models, and start both the backend and frontend.
- Docker and Docker Compose installed
- For GPU acceleration (optional): NVIDIA GPU with CUDA support
For systems with NVIDIA GPU:
docker compose up --buildFor CPU-only systems:
docker compose -f docker-compose.cpu.yml up --buildThat's it! 🎉 The application will:
- Start PostgreSQL database
- Start Qdrant vector database
- Start Ollama
- Automatically pull the LLM model (default:
llama3.2:1b) - Automatically pull the embedding model (
nomic-embed-text) - Seed sample documents into PostgreSQL
- Start the backend API
- Start the frontend
Open http://localhost:3000 and start chatting!
After startup, trigger a sync to index documents for RAG:
# Full sync (all documents)
curl -X POST http://localhost:3001/rag/sync
# Check sync status
curl http://localhost:3001/rag/sync/statusOption 1: Using a .env file (Recommended)
Create a .env file in the project root:
# .env
OLLAMA_MODEL=llama3.2:1b
EMBEDDING_MODEL=nomic-embed-textThen simply run:
docker compose -f docker-compose.cpu.yml up --buildChange the model anytime by editing .env:
# .env
OLLAMA_MODEL=mistralOption 2: Command line
# Use a different model
OLLAMA_MODEL=mistral docker compose -f docker-compose.cpu.yml up --build
# Or with llama3.2:3b
OLLAMA_MODEL=llama3.2:3b docker compose -f docker-compose.cpu.yml up --buildPopular models to try:
llama3.2:1b- Fast, lightweight (default)llama3.2:3b- Better quality, still fastmistral- Great all-around modelcodellama- Optimized for codephi3- Microsoft's efficient model
docker compose down
# To also remove all data (PostgreSQL, Qdrant, Ollama models):
docker compose down -vIf you prefer to run the application manually without Docker:
- Node.js >= 18
- pnpm >= 9.x (
npm install -g pnpm) - Ollama installed and running locally
- PostgreSQL running locally
- Qdrant running locally (optional, for RAG)
- Install Ollama from ollama.com
- Pull the required models:
ollama pull llama3.2:1b ollama pull nomic-embed-text
- Make sure Ollama is running (it runs on
http://localhost:11434by default)
Create a database for the application:
createdb ai_agentRun Qdrant locally:
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrantcustom-ai-agent/
├── apps/
│ ├── backend/ # NestJS API (port 3001)
│ │ └── src/
│ │ ├── chat/ # Chat module (LLM interaction)
│ │ ├── database/ # PostgreSQL entities & seeding
│ │ ├── embedding/# Embedding service (Ollama)
│ │ ├── qdrant/ # Vector database service
│ │ └── rag/ # RAG module (sync, search, scheduler)
│ └── frontend/ # Next.js app (port 3000)
├── packages/
│ └── tsconfig/ # Shared TypeScript configs
├── turbo.json # Turborepo pipeline
└── pnpm-workspace.yaml
pnpm installCreate a .env file in apps/backend/:
# Ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:1b
EMBEDDING_MODEL=nomic-embed-text
# PostgreSQL
DATABASE_HOST=localhost
DATABASE_PORT=5432
DATABASE_USER=postgres
DATABASE_PASSWORD=postgres
DATABASE_NAME=ai_agent
# Qdrant
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=documentsRun both apps in development mode:
pnpm devOr run them separately:
# Run only backend
pnpm dev:backend
# Run only frontend
pnpm dev:frontendpnpm build- Ensure Ollama, PostgreSQL, and Qdrant are running
- Start the development servers with
pnpm dev - Open http://localhost:3000 in your browser
- Trigger a sync:
curl -X POST http://localhost:3001/rag/sync - Start chatting with RAG-powered AI!
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ │ │ │ │ │
│ Next.js App │────▶│ NestJS API │────▶│ Ollama │
│ (port 3000) │ │ (port 3001) │ │ (port 11434) │
│ │◀────│ │◀────│ - LLM │
└─────────────────┘ └────────┬────────┘ │ - Embeddings │
Frontend │ └─────────────────┘
│
┌────────────┴────────────┐
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ Qdrant │
│ (port 5432) │ │ (port 6333) │
│ Documents │──────▶│ Vectors │
└─────────────────┘ └─────────────────┘
Source Data Vector Search
- Ingestion: Documents in PostgreSQL → Chunked → Embedded → Stored in Qdrant
- Query: User question → Embedded → Qdrant similarity search → Top-k relevant chunks
- Generation: Relevant context + question → Ollama LLM → Response
Send a message to the AI and receive a RAG-augmented response.
Request:
{
"message": "What services do you offer?"
}Response:
{
"response": "Based on the available information, I offer Web3 and blockchain development services including smart contract development, dApp development, and blockchain consulting..."
}Check the health status of Ollama and the configured model.
Response:
{
"status": "online",
"model": "llama3.2:1b",
"message": "Ready"
}Trigger a full sync of all documents from PostgreSQL to Qdrant.
Response:
{
"success": true,
"type": "full",
"documentsSynced": 15,
"chunksCreated": 23,
"duration": 5420
}Trigger an incremental sync (only documents modified since last sync).
Get the current sync status and statistics.
Response:
{
"isSyncing": false,
"lastSync": {
"type": "full",
"status": "completed",
"documentsSynced": 15,
"completedAt": "2024-01-15T10:30:00Z"
},
"qdrantPointsCount": 23,
"documentsCount": 15
}Test RAG search (debug endpoint).
Request:
{
"query": "blockchain development",
"limit": 5
}When running with Docker, the following services are created:
| Service | Container Name | Port | Description |
|---|---|---|---|
postgres |
ai-agent-postgres | 5432 | PostgreSQL database |
qdrant |
ai-agent-qdrant | 6333, 6334 | Qdrant vector database |
ollama |
ai-agent-ollama | 11434 | Ollama LLM runtime |
ollama-pull |
ai-agent-ollama-pull | - | One-time model puller |
backend |
ai-agent-backend | 3001 | NestJS API server |
frontend |
ai-agent-frontend | 3000 | Next.js web app |
| Variable | Default | Description |
|---|---|---|
OLLAMA_MODEL |
llama3.2:1b |
The Ollama LLM model to use |
EMBEDDING_MODEL |
nomic-embed-text |
The Ollama embedding model |
OLLAMA_BASE_URL |
http://ollama:11434 |
Ollama API URL (Docker) |
DATABASE_HOST |
postgres |
PostgreSQL host |
DATABASE_PORT |
5432 |
PostgreSQL port |
DATABASE_USER |
postgres |
PostgreSQL username |
DATABASE_PASSWORD |
postgres |
PostgreSQL password |
DATABASE_NAME |
ai_agent |
PostgreSQL database name |
QDRANT_URL |
http://qdrant:6333 |
Qdrant API URL |
QDRANT_COLLECTION |
documents |
Qdrant collection name |
CORS_ORIGINS |
http://localhost:3000 |
Allowed CORS origins |
NEXT_PUBLIC_API_URL |
http://localhost:3001 |
Backend API URL for frontend |
The application comes with sample seed data. To add your own documents:
- Edit
apps/backend/src/database/seeds/seed.service.ts - Modify the
SAMPLE_DOCUMENTSarray with your content - Restart the application or call the force-seed endpoint
- Trigger a sync:
curl -X POST http://localhost:3001/rag/sync
{
title: 'Document Title',
content: 'Full text content to be embedded and searched...',
category: 'faq', // Optional: for filtering
metadata: { type: 'faq', priority: 'high' } // Optional: extra data
}-
Model download is slow: The first run downloads models (~1.3GB for llama3.2:1b, ~275MB for nomic-embed-text). Be patient!
-
GPU not detected: If you have an NVIDIA GPU but it's not detected, ensure you have:
- NVIDIA Container Toolkit installed
- Latest NVIDIA drivers
- Use
docker compose(notdocker-compose)
-
Out of memory: Try a smaller model:
OLLAMA_MODEL=llama3.2:1b docker compose -f docker-compose.cpu.yml up --build
-
Port conflicts: If ports 3000, 3001, 5432, 6333, or 11434 are in use, stop the conflicting services or modify the ports in
docker-compose.yml. -
Database connection issues: Ensure PostgreSQL is healthy before backend starts. The Docker Compose healthchecks handle this automatically.
-
No context in responses: Make sure you've run the sync:
curl -X POST http://localhost:3001/rag/sync
-
Embedding errors: Ensure the embedding model is pulled:
ollama pull nomic-embed-text
-
Check sync status:
curl http://localhost:3001/rag/sync/status