A Retrieval-Augmented Generation (RAG) system built using Open source models.
This project demonstrates how to build an end-to-end pipeline that:
- Ingests multiple document formats (PDF, TXT, DOCX)
- Performs semantic retrieval using vector embeddings
- Generates context-aware answers using an LLM
- Evaluates response quality using RAGAS metrics
- β Multi-format document ingestion (PDF, TXT, DOCX)
- β Chunking strategy for optimal retrieval
- β Dense vector search using embeddings
- β Local LLM inference via Ollama
- β Persistent vector storage with ChromaDB
- β Built-in evaluation using RAGAS
- β Clean, modular pipeline (easy to extend)
Retrieval-Augmented Generation (RAG) enhances LLM responses by grounding them in external data.
- Generates answers from pre-trained knowledge only
- Retrieves relevant information from documents
- Uses that context to generate accurate responses
User Query
β
Embedding Model
β
Vector Similarity Search (Retriever)
β
Relevant Context Chunks
β
LLM (Answer Generation)
β
Final Response
β
RAGAS Evaluation
This project uses:
- Embedding Model:
nomic-embed-text - Vector Store:
ChromaDB - Retrieval: Top-K similarity search
| Type | Description | Used |
|---|---|---|
| Symmetric | Query and documents are similar in size/style | β |
| Asymmetric | Short query vs long documents | β |
π This implementation uses Asymmetric Semantic Search, which is ideal for:
- Question-answering systems
- Document retrieval use cases
rag-project/
β
βββ main.py
β
βββ data/ # Input documents
β βββ sample.pdf
β βββ notes.txt
β βββ report.docx
β
βββ chroma_db/ # Persistent vector store
β
βββ requirements.txt
β
βββ README.md
Supports:
- PDF (
PyPDFLoader) - TXT (
TextLoader) - DOCX (
Docx2txtLoader)
chunk_size = 500
chunk_overlap = 50
Ensures:
- Better semantic coherence
- Improved retrieval accuracy
- Model:
nomic-embed-text - Type: Dense semantic embeddings
- Converts text β vector representations
- Database:
ChromaDB - Stores:
- Text chunks
- Corresponding embeddings
- Persistence enabled (
chroma_db/)
retriever = vector_db.as_retriever()
- Performs similarity search
- Returns top-K relevant chunks
- Model:
llama3(via Ollama) - Role:
- Context-aware answer generation
- Evaluation (via RAGAS wrapper)
Answer only using the context below.
- Reduces hallucination
- Ensures grounded responses
This project integrates RAGAS for systematic evaluation.
| Metric | Description |
|---|---|
| context_precision | Relevance of retrieved chunks |
| context_recall | Coverage of relevant information |
| faithfulness | Consistency with context |
| answer_relevancy | Quality of answer |
{
question,
answer,
retrieved_contexts,
contexts,
ground_truth
}
β οΈ Replaceground_truthwith expected answers for meaningful evaluation
Download and install ollama
ollama pull llama3
ollama pull nomic-embed-text
pip install -r requirements.txt
Place your files inside:
/data
| Concept | Description |
|---|---|
| Embeddings | Vector representation of text |
| Chunking | Splitting documents for processing |
| Retriever | Finds relevant content |
| Vector DB | Stores embeddings |
| LLM | Generates responses |
| RAGAS | Evaluates system quality |
- No keyword-based retrieval (BM25)
- No hybrid search (dense + sparse)
- No re-ranking layer
- Performance depends on embedding quality
- Hybrid Search (BM25 + Vector)
- Cross-encoder re-ranking
- Metadata filtering
- Query expansion techniques
- REST API (FastAPI)
This project demonstrates a complete local RAG architecture, combining:
- Dense semantic retrieval
- Context-aware LLM generation
- Quantitative evaluation
π A strong foundation for building production-grade AI applications.
- Enterprise document search
- Knowledge base assistants
- Internal Q&A systems
- Research assistants
This project was developed as part of my learning journey in Python. As a learning assistant to understand concepts and structure the code took help of GPT Models. The final implementation, testing, and project setup were completed by "infoanupampal@gmail.com"
Built with β€οΈ by anupamLab