TierRAG is a modular, high-performance Retrieval-Augmented Generation (RAG) system built with FastAPI, MongoDB Atlas, and Pinecone. It is designed to handle user-authenticated document ingestion, multi-tier intelligent caching, and configurable chunking strategies.
Key capabilities include:
- Multi-Tier Caching: Implements an in-memory 3-tier cache (Exact, Semantic, and Retrieval) to aggressively minimize latency and LLM API costs.
- Advanced Chunking: Supports both simple Recursive Character chunking and relationship-preserving Parent-Child chunking.
- Document Versioning: Automatically manages active vs. archived versions of documents in MongoDB so overlapping document uploads don't pollute the vector space.
- Namespace Isolation: Enforces secure, per-user data isolation in Pinecone via JWT-based authentication.
- Reranking & Generation: Utilizes cross-encoder reranking (
bge-reranker-v2-m3) and Groq's high-speed inference API (llama-3.3-70b-versatile) for highly accurate context generation. - Deep Observability: Integrated with LangSmith to trace chunking, embedding, vector retrieval, and LLM generative logic at a granular level.
For a deep dive into the system's architecture, flow diagrams, and design decisions, see the Architecture Document.
- Python 3.9+
- A MongoDB Atlas cluster (or local MongoDB string)
- A Pinecone account (API Key)
- A Groq account (API Key for LLM generation)
git clone <your-repo-url>
cd rag-project-2Create and activate a virtual environment, then install the required Python packages:
python -m venv .venv
# On Windows:
.venv\Scripts\activate
# On macOS/Linux:
source .venv/bin/activate
pip install -r requirements.txtCopy the .env.example file to create your own configuration file:
cp .env.example .envOpen .env and configure your credentials:
# MongoDB Connection (Make sure to whitelist your IP in Atlas)
CONNECTION_STRING="mongodb+srv://<user>:<password>@cluster0...mongodb.net/"
# Auth Configuration
SECRET_KEY="your_secure_randomly_generated_jwt_secret"
ALGORITHM="HS256"
# Vector Database (Pinecone)
PINECONE_API_KEY="your_pinecone_api_key_here"
# LLM Generation (Groq)
GROQ_API_KEY="your_groq_api_key_here"
# Observability (LangSmith)
LANGCHAIN_TRACING_V2="true"
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_API_KEY="your_langsmith_api_key_here"
LANGCHAIN_PROJECT="TierRAG"(Note: Chunking parameters, cache thresholds, and model selections are further configurable in src/config.py)
- Start the FastAPI Server: Ensure your virtual environment is activated, then run the application using Uvicorn:
uvicorn main:app --reload-
Access the API Documentation: Once the server is running, open your browser and navigate to the interactive Swagger UI: 👉 http://localhost:8000/docs
-
General Usage Flow Testing:
- Register & Login: Use
POST /api/admin/registerthenPOST /api/admin/loginto get a JWT Bearer token. - Authorize Context: Click the "Authorize" button in Swagger UI and paste the Bearer token.
- Upload Data: Use
POST /uploadto upload a.pdfor.txtfile. - Query: Use
POST /queryto ask questions about your uploaded documents and see the multi-tier caching in action!
- Register & Login: Use
├── main.py # FastAPI app entrypoint
├── requirements.txt # Project dependencies
├── api/
│ ├── auth/ # JWT Registration & login
│ ├── ingestion/ # Versioned document upload
│ └── generation/ # Query & Answer generation (Caching integration)
├── src/
│ ├── config.py # Global Config (MongoDB, Pinecone, Chunking, Cache Thresholds)
│ ├── chunking/ # Splitting logic (Parent-Child & Recursive Character)
│ ├── embedding/ # Pinecone vectorization and indexing
│ ├── retrieval/ # Active doc filtering & Reranking
│ ├── generation/ # Groq LLM integration
│ └── caching/ # 3-Tier In-Memory Dictionary Caches (Exact, Semantic, Retrieval)
└── docs/
└── architecture.md # Detailed diagrams & design decisions