Skip to content

anupamlab/RAG-open-source

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“¦ RAG Pipeline (Multi-Document + Evaluation)

A Retrieval-Augmented Generation (RAG) system built using Open source models.
This project demonstrates how to build an end-to-end pipeline that:

  • Ingests multiple document formats (PDF, TXT, DOCX)
  • Performs semantic retrieval using vector embeddings
  • Generates context-aware answers using an LLM
  • Evaluates response quality using RAGAS metrics

πŸš€ Key Features

  • βœ… Multi-format document ingestion (PDF, TXT, DOCX)
  • βœ… Chunking strategy for optimal retrieval
  • βœ… Dense vector search using embeddings
  • βœ… Local LLM inference via Ollama
  • βœ… Persistent vector storage with ChromaDB
  • βœ… Built-in evaluation using RAGAS
  • βœ… Clean, modular pipeline (easy to extend)

🧠 What is RAG?

Retrieval-Augmented Generation (RAG) enhances LLM responses by grounding them in external data.

Standard LLM ❌

  • Generates answers from pre-trained knowledge only

RAG System βœ…

  1. Retrieves relevant information from documents
  2. Uses that context to generate accurate responses

πŸ”„ Pipeline Overview

User Query
    ↓
Embedding Model
    ↓
Vector Similarity Search (Retriever)
    ↓
Relevant Context Chunks
    ↓
LLM (Answer Generation)
    ↓
Final Response
    ↓
RAGAS Evaluation

πŸ” Retrieval Methodology

This project uses:

βœ… Dense Vector Similarity Search

  • Embedding Model: nomic-embed-text
  • Vector Store: ChromaDB
  • Retrieval: Top-K similarity search

πŸ”„ Symmetric vs Asymmetric Search

Type Description Used
Symmetric Query and documents are similar in size/style ❌
Asymmetric Short query vs long documents βœ…

πŸ‘‰ This implementation uses Asymmetric Semantic Search, which is ideal for:

  • Question-answering systems
  • Document retrieval use cases

πŸ—οΈ Project Structure

rag-project/
β”‚
β”œβ”€β”€ main.py 
β”‚
β”œβ”€β”€ data/                   # Input documents
β”‚   β”œβ”€β”€ sample.pdf
β”‚   β”œβ”€β”€ notes.txt
β”‚   β”œβ”€β”€ report.docx
β”‚
β”œβ”€β”€ chroma_db/              # Persistent vector store
β”‚
β”œβ”€β”€ requirements.txt
β”‚
└── README.md   

βš™οΈ System Components

1️⃣ Document Loader

Supports:

  • PDF (PyPDFLoader)
  • TXT (TextLoader)
  • DOCX (Docx2txtLoader)

2️⃣ Text Chunking

chunk_size = 500
chunk_overlap = 50

Ensures:

  • Better semantic coherence
  • Improved retrieval accuracy

3️⃣ Embeddings

  • Model: nomic-embed-text
  • Type: Dense semantic embeddings
  • Converts text β†’ vector representations

4️⃣ Vector Store

  • Database: ChromaDB
  • Stores:
    • Text chunks
    • Corresponding embeddings
  • Persistence enabled (chroma_db/)

5️⃣ Retriever

retriever = vector_db.as_retriever()
  • Performs similarity search
  • Returns top-K relevant chunks

6️⃣ LLM (Local Inference)

  • Model: llama3 (via Ollama)
  • Role:
    • Context-aware answer generation
    • Evaluation (via RAGAS wrapper)

7️⃣ Prompt Design

Answer only using the context below.
  • Reduces hallucination
  • Ensures grounded responses

πŸ“Š Evaluation with RAGAS

This project integrates RAGAS for systematic evaluation.

Metrics Used

Metric Description
context_precision Relevance of retrieved chunks
context_recall Coverage of relevant information
faithfulness Consistency with context
answer_relevancy Quality of answer

Dataset Format

{
  question,
  answer,
  retrieved_contexts,
  contexts,
  ground_truth
}

⚠️ Replace ground_truth with expected answers for meaningful evaluation


πŸš€ Getting Started

1️⃣ Install Ollama

Download and install ollama


2️⃣ Pull Required Models

ollama pull llama3
ollama pull nomic-embed-text

3️⃣ Install Dependencies

pip install -r requirements.txt

4️⃣ Add Documents

Place your files inside:

/data

5️⃣ Run the Application


🧠 Key Concepts

Concept Description
Embeddings Vector representation of text
Chunking Splitting documents for processing
Retriever Finds relevant content
Vector DB Stores embeddings
LLM Generates responses
RAGAS Evaluates system quality

⚠️ Limitations

  • No keyword-based retrieval (BM25)
  • No hybrid search (dense + sparse)
  • No re-ranking layer
  • Performance depends on embedding quality

πŸš€ Future Enhancements

  • Hybrid Search (BM25 + Vector)
  • Cross-encoder re-ranking
  • Metadata filtering
  • Query expansion techniques
  • REST API (FastAPI)

🏁 Conclusion

This project demonstrates a complete local RAG architecture, combining:

  • Dense semantic retrieval
  • Context-aware LLM generation
  • Quantitative evaluation

πŸ‘‰ A strong foundation for building production-grade AI applications.


πŸ“¬ Use Cases

  • Enterprise document search
  • Knowledge base assistants
  • Internal Q&A systems
  • Research assistants

This project was developed as part of my learning journey in Python. As a learning assistant to understand concepts and structure the code took help of GPT Models. The final implementation, testing, and project setup were completed by "infoanupampal@gmail.com"

Built with ❀️ by anupamLab

About

πŸ“ƒπŸ” RAG (Retrieval-Augmented Generation) pipeline that performs semantic vector search over multi-format documents and generates context-aware answers using LLMs, with built-in evaluation using RAGAS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages