A Retrieval-Augmented Generation (RAG) chatbot for API Platform documentation. It uses a hybrid RAG approach — combining Markdown documentation and real PHP code (fixtures/tests) — to provide accurate, grounded technical answers.
- Hybrid retrieval: Queries both
docs(Markdown) andcode(PHP) vector stores simultaneously for richer context. - Query refinement: An LLM pre-processes every user query to extract technical keywords before retrieval, improving recall.
- Smart ingestion: Auto-detects file types (
.mdvs.php), preserves Markdown header hierarchy, and uses SQL transactions for atomic indexing. - AI judge system: A separate LLM evaluates answer quality (0–5) against a ground truth, independent of the generation model.
- Benchmarking suite: Pluggable test generators for robustness, security, bias, and context-noise evaluation.
- Backend: Symfony 7.4 (PHP 8.2+)
- Database: PostgreSQL 16 +
pgvectorextension - LLM engine: Ollama (local)
- Embeddings:
nomic-embed-text(fixed, 384-dim) - Chat / Refiner model:
qwen2.5-coder:14b(default, configurable) - Judge model:
llama3.1:8b(default, configurable)
- Embeddings:
docker compose up -d --build# Embedding model (mandatory)
docker exec -it chatbot-ollama-1 ollama pull nomic-embed-text
# Chat and query-refinement model
docker exec -it chatbot-ollama-1 ollama pull qwen2.5-coder:14b
# Judge model (answer evaluation)
docker exec -it chatbot-ollama-1 ollama pull llama3.1:8bgit clone https://github.com/api-platform/docs.git docs
git clone https://github.com/api-platform/core.git coreCreate a .env.local file to override model names or other defaults:
# Model used to answer user questions
RAG_MODEL_NAME=qwen2.5-coder:14b
# Model used to rewrite/refine queries before retrieval
REFINER_MODEL_NAME=qwen2.5-coder:14b
# Model used to grade answers during benchmarking
JUDGE_MODEL_NAME=llama3.1:8b
# Retrieval source: docs | code | combined
CHAT_SOURCE=combinedGenerate the list of PHP fixture files used in API Platform tests (crucial for "code as documentation"):
php resolve-fixtures.php > files_to_index.txtThen run ingestion. It automatically routes each file to vector_store_docs or vector_store_code:
# Clear the database and re-index everything (docs/ + core/ + fixtures)
php bin/console app:ingest --clear .# Quick sanity check on 3 random questions
php bin/console app:benchmark --sample 3 --light
# Full benchmark with robustness and security tests
php bin/console app:benchmark --robustness --security --skip-coherenceIngestion: resolve-fixtures.php → files_to_index.txt → app:ingest → PostgreSQL (vector_store_docs + vector_store_code)
Inference: User query → QueryRefiner → nomic-embed-text → pgvector search → RagService → Ollama LLM → response + sources
Evaluation: app:benchmark → JudgeService (0–5 score) + CoherenceAnalyzer (cosine similarity) → benchmark_results.csv
- Discovery:
resolve-fixtures.phpscanscore/teststo enumerate PHP fixtures and classes used in API Platform's functional tests. - Chunking:
- Markdown: Splits on headers, preserves full hierarchy in each chunk (e.g.,
Security > Firewall > Config), max 1 500 chars. - PHP: Fixed-size chunks with 200-char overlap, wrapped in code fences.
- Markdown: Splits on headers, preserves full hierarchy in each chunk (e.g.,
- Embedding: Chunks are converted to 384-dim vectors via
nomic-embed-text. - Storage: Vectors are committed to
vector_store_docsorvector_store_codein PostgreSQL (pgvector).
- Query refinement (
QueryRefiner): The user query is rewritten by an LLM to extract precise technical keywords, improving vector search recall. - Embedding: The refined query is embedded with
nomic-embed-text. - Hybrid search: Top-2 chunks from
vector_store_docs+ Top-2 fromvector_store_codeare retrieved (4 chunks total incombinedmode). - Prompt construction: Retrieved chunks are assembled into a strict system prompt (16 KB max context).
- Generation: The LLM defined by
RAG_MODEL_NAMEproduces a concise, grounded answer with source references.
Each answer is sent to a separate judge LLM (JUDGE_MODEL_NAME) alongside the expected ground truth. The judge assigns a score from 0 to 5 and returns a reasoning explanation.
CoherenceAnalyzer measures cosine similarity across multiple responses to the same question to detect inconsistencies.
# Generate the PHP fixture file list
php resolve-fixtures.php > files_to_index.txt
# Ingest and re-index everything
php bin/console app:ingest --clear .
# Ingest a specific path only (incremental)
php bin/console app:ingest docs/# Quick sanity check (3 questions, light mode)
php bin/console app:benchmark --sample 3 --light
# Full benchmark with all test suites
php bin/console app:benchmark --robustness --security --bias --context-noise
# Run in parallel (4 concurrent questions)
php bin/console app:benchmark --parallel 4
# Test a specific retrieval source
php bin/console app:benchmark --source docs
php bin/console app:benchmark --source code
php bin/console app:benchmark --source combined| Option | Description |
|---|---|
--sample <int> |
Run on N randomly selected questions |
--light |
Fewer variations per test generator (faster) |
--robustness |
Enable robustness tests (typos, casing, punctuation) |
--security |
Enable security tests (prompt injection, jailbreak) |
--bias |
Enable bias tests (gender/cultural variations) |
--context-noise |
Enable context-noise tests (distracting injections) |
--skip-coherence |
Skip cosine-similarity coherence scoring |
--parallel <int> |
Number of questions processed concurrently |
--source <string> |
Restrict retrieval to docs, code, or combined |
--reindex |
Force a full re-index before running tests |
Results are saved to benchmark_results.csv at the project root with the following columns:
| Column | Description |
|---|---|
Date |
Timestamp of the test run |
Source |
docs, code, or combined |
Category |
basic, advanced, security, trap |
Question |
The user query |
ChatBot Response |
The generated answer |
Score (0-5) |
Quality score from the AI judge |
Judge Reason |
Explanation of the score |
Model Used |
LLM used for generation |
Response Time (ms) |
Total execution time |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql://... |
PostgreSQL connection string |
OLLAMA_URL |
http://localhost:11434 |
Ollama API endpoint |
RAG_MODEL_NAME |
qwen2.5-coder:14b |
LLM for answer generation |
REFINER_MODEL_NAME |
qwen2.5-coder:14b |
LLM for query refinement |
JUDGE_MODEL_NAME |
llama3.1:8b |
LLM for answer evaluation |
CHAT_SOURCE |
combined |
Retrieval source (docs / code / combined) |
Configure via .env.local. Never commit secrets.