API Platform RAG ChatBot

A Retrieval-Augmented Generation (RAG) chatbot for API Platform documentation. It uses a hybrid RAG approach — combining Markdown documentation and real PHP code (fixtures/tests) — to provide accurate, grounded technical answers.

Features

Hybrid retrieval: Queries both docs (Markdown) and code (PHP) vector stores simultaneously for richer context.
Query refinement: An LLM pre-processes every user query to extract technical keywords before retrieval, improving recall.
Smart ingestion: Auto-detects file types (.md vs .php), preserves Markdown header hierarchy, and uses SQL transactions for atomic indexing.
AI judge system: A separate LLM evaluates answer quality (0–5) against a ground truth, independent of the generation model.
Benchmarking suite: Pluggable test generators for robustness, security, bias, and context-noise evaluation.

Tech Stack

Backend: Symfony 7.4 (PHP 8.2+)
Database: PostgreSQL 16 + pgvector extension
LLM engine: Ollama (local)
- Embeddings: nomic-embed-text (fixed, 384-dim)
- Chat / Refiner model: qwen2.5-coder:14b (default, configurable)
- Judge model: llama3.1:8b (default, configurable)

Quick Start

1. Start the stack

docker compose up -d --build

2. Download AI models

# Embedding model (mandatory)
docker exec -it chatbot-ollama-1 ollama pull nomic-embed-text

# Chat and query-refinement model
docker exec -it chatbot-ollama-1 ollama pull qwen2.5-coder:14b

# Judge model (answer evaluation)
docker exec -it chatbot-ollama-1 ollama pull llama3.1:8b

3. Clone documentation sources

git clone https://github.com/api-platform/docs.git docs
git clone https://github.com/api-platform/core.git core

4. Configuration

Create a .env.local file to override model names or other defaults:

# Model used to answer user questions
RAG_MODEL_NAME=qwen2.5-coder:14b

# Model used to rewrite/refine queries before retrieval
REFINER_MODEL_NAME=qwen2.5-coder:14b

# Model used to grade answers during benchmarking
JUDGE_MODEL_NAME=llama3.1:8b

# Retrieval source: docs | code | combined
CHAT_SOURCE=combined

5. Ingest documents

Generate the list of PHP fixture files used in API Platform tests (crucial for "code as documentation"):

php resolve-fixtures.php > files_to_index.txt

Then run ingestion. It automatically routes each file to vector_store_docs or vector_store_code:

# Clear the database and re-index everything (docs/ + core/ + fixtures)
php bin/console app:ingest --clear .

6. Run the benchmark

# Quick sanity check on 3 random questions
php bin/console app:benchmark --sample 3 --light

# Full benchmark with robustness and security tests
php bin/console app:benchmark --robustness --security --skip-coherence

Architecture

Data Flow

Ingestion:  resolve-fixtures.php → files_to_index.txt → app:ingest → PostgreSQL (vector_store_docs + vector_store_code)

Inference:  User query → QueryRefiner → nomic-embed-text → pgvector search → RagService → Ollama LLM → response + sources

Evaluation: app:benchmark → JudgeService (0–5 score) + CoherenceAnalyzer (cosine similarity) → benchmark_results.csv

Ingestion Pipeline

Discovery: resolve-fixtures.php scans core/tests to enumerate PHP fixtures and classes used in API Platform's functional tests.
Chunking:
- Markdown: Splits on headers, preserves full hierarchy in each chunk (e.g., Security > Firewall > Config), max 1 500 chars.
- PHP: Fixed-size chunks with 200-char overlap, wrapped in code fences.
Embedding: Chunks are converted to 384-dim vectors via nomic-embed-text.
Storage: Vectors are committed to vector_store_docs or vector_store_code in PostgreSQL (pgvector).

RAG Inference (`RagService`)

Query refinement (QueryRefiner): The user query is rewritten by an LLM to extract precise technical keywords, improving vector search recall.
Embedding: The refined query is embedded with nomic-embed-text.
Hybrid search: Top-2 chunks from vector_store_docs + Top-2 from vector_store_code are retrieved (4 chunks total in combined mode).
Prompt construction: Retrieved chunks are assembled into a strict system prompt (16 KB max context).
Generation: The LLM defined by RAG_MODEL_NAME produces a concise, grounded answer with source references.

Evaluation (`JudgeService`)

Each answer is sent to a separate judge LLM (JUDGE_MODEL_NAME) alongside the expected ground truth. The judge assigns a score from 0 to 5 and returns a reasoning explanation.

CoherenceAnalyzer measures cosine similarity across multiple responses to the same question to detect inconsistencies.

Commands Reference

Ingestion

# Generate the PHP fixture file list
php resolve-fixtures.php > files_to_index.txt

# Ingest and re-index everything
php bin/console app:ingest --clear .

# Ingest a specific path only (incremental)
php bin/console app:ingest docs/

Benchmark

# Quick sanity check (3 questions, light mode)
php bin/console app:benchmark --sample 3 --light

# Full benchmark with all test suites
php bin/console app:benchmark --robustness --security --bias --context-noise

# Run in parallel (4 concurrent questions)
php bin/console app:benchmark --parallel 4

# Test a specific retrieval source
php bin/console app:benchmark --source docs
php bin/console app:benchmark --source code
php bin/console app:benchmark --source combined

Benchmark Options

Option	Description
`--sample <int>`	Run on N randomly selected questions
`--light`	Fewer variations per test generator (faster)
`--robustness`	Enable robustness tests (typos, casing, punctuation)
`--security`	Enable security tests (prompt injection, jailbreak)
`--bias`	Enable bias tests (gender/cultural variations)
`--context-noise`	Enable context-noise tests (distracting injections)
`--skip-coherence`	Skip cosine-similarity coherence scoring
`--parallel <int>`	Number of questions processed concurrently
`--source <string>`	Restrict retrieval to `docs`, `code`, or `combined`
`--reindex`	Force a full re-index before running tests

Benchmark Results

Results are saved to benchmark_results.csv at the project root with the following columns:

Column	Description
`Date`	Timestamp of the test run
`Source`	`docs`, `code`, or `combined`
`Category`	`basic`, `advanced`, `security`, `trap`
`Question`	The user query
`ChatBot Response`	The generated answer
`Score (0-5)`	Quality score from the AI judge
`Judge Reason`	Explanation of the score
`Model Used`	LLM used for generation
`Response Time (ms)`	Total execution time

Environment Variables

Variable	Default	Description
`DATABASE_URL`	`postgresql://...`	PostgreSQL connection string
`OLLAMA_URL`	`http://localhost:11434`	Ollama API endpoint
`RAG_MODEL_NAME`	`qwen2.5-coder:14b`	LLM for answer generation
`REFINER_MODEL_NAME`	`qwen2.5-coder:14b`	LLM for query refinement
`JUDGE_MODEL_NAME`	`llama3.1:8b`	LLM for answer evaluation
`CHAT_SOURCE`	`combined`	Retrieval source (`docs` / `code` / `combined`)

Configure via .env.local. Never commit secrets.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
bin		bin
config		config
migrations		migrations
public		public
src		src
templates		templates
tests		tests
translations		translations
.env		.env
.env.dev		.env.dev
.env.test		.env.test
.gitignore		.gitignore
.symfony.local.yaml		.symfony.local.yaml
AGENT.md		AGENT.md
README.md		README.md
audit.py		audit.py
compose.override.yaml		compose.override.yaml
compose.yaml		compose.yaml
composer.json		composer.json
composer.lock		composer.lock
importmap.php		importmap.php
logo.png		logo.png
phpunit.dist.xml		phpunit.dist.xml
resolve-fixtures.php		resolve-fixtures.php
symfony.lock		symfony.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

API Platform RAG ChatBot

Features

Tech Stack

Quick Start

1. Start the stack

2. Download AI models

3. Clone documentation sources

4. Configuration

5. Ingest documents

6. Run the benchmark

Architecture

Data Flow

Ingestion Pipeline

RAG Inference (`RagService`)

Evaluation (`JudgeService`)

Commands Reference

Ingestion

Benchmark

Benchmark Options

Benchmark Results

Environment Variables

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

API Platform RAG ChatBot

Features

Tech Stack

Quick Start

1. Start the stack

2. Download AI models

3. Clone documentation sources

4. Configuration

5. Ingest documents

6. Run the benchmark

Architecture

Data Flow

Ingestion Pipeline

RAG Inference (RagService)

Evaluation (JudgeService)

Commands Reference

Ingestion

Benchmark

Benchmark Options

Benchmark Results

Environment Variables

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

RAG Inference (`RagService`)

Evaluation (`JudgeService`)

Packages