🤖 SmartBot

A powerful Streamlit-based application that enables intelligent conversations with your documents using Advanced RAG (Retrieval-Augmented Generation) technology and Google's Gemini AI.

✨ Features

📄 Multi-Format Support: PDF, Word (DOCX), PowerPoint (PPTX), HTML, Text/Markdown, and Images
🧠 Smart Retrieval: Uses FAISS vector store with sentence transformers for accurate document retrieval
💬 Conversational AI: Powered by Google's Gemini Flash for intelligent responses
📚 Source References: View exact document snippets used to generate answers
🎨 Modern UI: Clean, intuitive interface with custom styling
💾 Session Memory: Maintains conversation context throughout your session

🧠 Advanced Features

🔍 OCR Capability: Extract text from scanned documents and images using Tesseract
🔄 Hybrid Search: Combines semantic search (embeddings) with BM25 keyword matching
🎯 Query Decomposition: Breaks complex queries into sub-questions for better accuracy
📊 Document Reranking: Uses cross-encoder models to rank retrieved documents by relevance
📝 Smart Summaries: Generate brief, detailed, or comprehensive document summaries
❓ Follow-up Questions: Auto-generates contextual follow-up questions

🚀 Quick Start

Installation

Clone the repository

git clone https://github.com/AjayChikate/SmartBot.git
cd smartbot

Install dependencies

pip install -r requirements.txt

Set up environment variables

Create a .env file in the project root:

GOOGLE_API_KEY=your_google_api_key_here
TESSERACT_CMD=C:/Program Files/Tesseract-OCR/tesseract.exe
POPPLER_PATH=C:/Program Files/poppler/Library/bin

Run the application

streamlit run app.py

🎯 Usage

Upload Documents: Use the sidebar to upload one or more documents in supported formats
Enable OCR: Toggle OCR if you have scanned documents or images with text
Process: Click "Process Documents" to build the knowledge base
Chat: Ask questions about your documents in natural language
View Sources: Expand the source references to see which document sections were used
Follow ups: Click follow-up questions to dive deeper
Summary: Generate document summaries (Brief/Detailed/Comprehensive)

🏗️ Project Structure

SmartBot/
│
├── app.py                 # Main Streamlit application
├── rag.py                 # RAG pipeline (chunking, vectorstore, conversation chain)
├── processor.py           # Document processing orchestrator
├── extraction.py          # Text extraction for various file formats
├── ocr.py                 # OCR functionality using Tesseract
├── htmlTemplates.py       # CSS and HTML templates for UI
├── requirements.txt       # Project dependencies
├── .env                   # Environment variables (not in repo)
└── README.md

📊 Metrics

Embedding Model: BAAI/bge-small-en-v1.5 (384 dimensions)
Reranking Model: cross-encoder/ms-marco-MiniLM-L-6-v2
Vector Store: Chroma (SQLite-backed)
LLM: Google Gemini Flash

🚀 Performance Tips

GPU Acceleration: Enable for faster embeddings
- Requires CUDA-capable NVIDIA GPU
- ~3-5x faster embedding generation
Chunking Strategy:
- Smaller chunks (200-300): Better precision
- Larger chunks (500-600): Better context
Retrieval Count:
- Small documents: k=4-5
- Large documents: k=8-10
Reranking: Disable for <5 retrieved docs

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

⭐ If you find this project helpful, please consider giving it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 SmartBot

✨ Features

🧠 Advanced Features

🚀 Quick Start

Installation

🎯 Usage

🏗️ Project Structure

📊 Metrics

🚀 Performance Tips

🤝 Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
app.py		app.py
extraction.py		extraction.py
htmlTemplates.py		htmlTemplates.py
ocr.py		ocr.py
processor.py		processor.py
rag.py		rag.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🤖 SmartBot

✨ Features

🧠 Advanced Features

🚀 Quick Start

Installation

🎯 Usage

🏗️ Project Structure

📊 Metrics

🚀 Performance Tips

🤝 Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages