DocuMind - Intelligent Document Chat

Built by: kk

DocuMind is an advanced RAG (Retrieval-Augmented Generation) application that allows users to seamlessly interact with their documents using AI. By leveraging the power of Google's Gemini models, Groq's high-speed inference, and vector search, DocuMind provides accurate, context-aware answers from your uploaded files and databases.

Live Demo

Frontend Application: https://querywise.vercel.app
Backend API: https://documind-p046.onrender.com

Modular RAG Architecture

DocuMind is built upon a highly scalable Modular RAG framework, ensuring precise context retrieval and coherent responses. The system comprises 6 Main Components:

Orchestrator (RAG Service): Coordinates the entire workflow, managing the flow of data between user input, retrieval, and generation.
Document Ingestion Engine: Handles the parsing, cleaning, and segmentation of various file formats (PDF, DOCX, TXT).
Embedding Service: Transforms text chunks into high-dimensional vector embeddings for semantic understanding.
Vector Store (Pinecone): Manages high-performance similarity search and retrieval of context.
Parent-Child Indexing Service: Implements advanced indexing strategies to maximize retrieval quality.
Generation Service (Gemini/Groq): Leverages state-of-the-art LLMs to synthesize answers using the retrieved context.

Vector Storing Methods: Parent & Child Indexing

To overcome the limitations of standard chunking, DocuMind employs a Parent-Child Indexing strategy:

Child Chunks: Small, dense text segments (~300 chars) responsible for high-accuracy semantic search.
Parent Documents: Larger context blocks linked to the child chunks.
Retrieval Logic: When a child chunk matches a user's query, the system retrieves its corresponding Parent Document. This ensures the LLM receives full, coherent context rather than fragmented snippets.

Key Features

🧠 Advanced RAG Engine

Chat with Documents: Upload PDFs, DOCX, or text files and ask questions in natural language.
Transparent AI: View Vector Scores (V) and Resonance Scores (R) to understand exactly why a document was retrieved.
Strict Scoring: Uses a local Reranker (ms-marco-MiniLM-L-12-v2) to prioritize accuracy over recall.
HyDE: Uses Hypothetical Document Embeddings to improve search relevance.

📊 Database Intelligence & Visualization

Text-to-SQL: Connect your database and ask questions in plain English (e.g., "Show me total sales by region").
Auto-Visualization: The system automatically detects data patterns and generates Bar, Line, or Pie charts instantly.
Schema Awareness: Automatically extracts and understands your database structure for accurate queries.
SQL Safety: Built-in validator prevents destructive queries and ensures syntax correctness.

⚡ Performance & Security

Multi-Model Support: Switch between Google Gemini 2.5 for reasoning and Groq (Llama 3) for ultra-fast responses.
Secure Authentication: Robust user management with unique user IDs and secure session handling.
Smart History: Persistent chat sessions allowing you to revisit previous conversations.
Mobile Responsive: A modern, mobile-friendly interface built with React and Tailwind CSS.

Technology Stack

Frontend: React, Vite, Tailwind CSS, Recharts
Backend: Python, FastAPI
AI/LLM: Google Gemini, Groq
Vector DB: Pinecone
Database: MongoDB Atlas (Chat History), PostgreSQL/MySQL (Data Analysis)
Deployment: Vercel (Frontend), Render (Backend)

⚠️ Project Protection Notice

This project is protected and authenticated with embedded signature verification. The signature system ensures proper attribution and project integrity.

Key Protection Features:

🔒 Signature Verification: Both backend and frontend verify the project signature on startup
🛡️ Tamper Protection: Removing or modifying the signature will prevent the application from running
👤 Developer Attribution: Clear attribution to the original developer throughout the application
🔐 License Protection: Built-in license key validation system

Protected Files:

.signature (Backend & Frontend signature files)
api/lib/signature_guard.py (Backend verification service)
frontend/src/utils/signatureGuard.js (Frontend verification service)

⚠️ WARNING: Do not remove or modify the signature files or verification code. The application will not function without valid signatures.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
api		api
documentation		documentation
frontend		frontend
tests		tests
.gitignore		.gitignore
README.md		README.md
dev.sh		dev.sh
render.yaml		render.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuMind - Intelligent Document Chat

Live Demo

Modular RAG Architecture

Vector Storing Methods: Parent & Child Indexing

Key Features

🧠 Advanced RAG Engine

📊 Database Intelligence & Visualization

⚡ Performance & Security

Technology Stack

⚠️ Project Protection Notice

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuMind - Intelligent Document Chat

Live Demo

Modular RAG Architecture

Vector Storing Methods: Parent & Child Indexing

Key Features

🧠 Advanced RAG Engine

📊 Database Intelligence & Visualization

⚡ Performance & Security

Technology Stack

⚠️ Project Protection Notice

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages