Skip to content

shreyasren/LocalRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LocalRAG - Enhanced Document Q&A System

A powerful, completely local RAG (Retrieval-Augmented Generation) system with advanced technical document processing that answers questions without requiring any external APIs or internet connection. Perfect for handling sensitive technical documents while maintaining privacy and da## πŸ“‹ Repository Structure

LocalRAG/
β”œβ”€β”€ πŸ“„ README.md                    # Complete documentation (this file)
β”œβ”€β”€ πŸ“„ LICENSE                      # MIT License
β”œβ”€β”€ 🐍 doc-qa-local.py              # ✨ Main application script
β”œβ”€β”€ πŸ”§ enhanced_processor.py        # ✨ Technical document processing module
β”œβ”€β”€ πŸ“‹ local-requirements.txt       # Python dependencies
β”œβ”€β”€ πŸ”§ setup.sh                     # Automated setup script
β”œβ”€β”€ πŸ“ local_chroma_db/             # Vector database storage (created after processing)
β”œβ”€β”€ πŸ“ venv/                        # Python virtual environment (created during setup)
β”œβ”€β”€ πŸ“„ test_document.txt            # Sample test document
└── πŸ“„ .gitignore                   # Git ignore patterns

Core Files

doc-qa-local.py - Main application with CLI interface

  • Document processing with enhanced chunking
  • Multiple answer generation modes
  • Interactive Q&A sessions
  • Extensive command-line options

enhanced_processor.py - Advanced document processing

  • Technical document pattern recognition
  • Smart chunking with section awareness
  • Enhanced vector search with relevance scoring
  • Comprehensive answer generation

local-requirements.txt - Python dependencies

  • sentence-transformers (embeddings)
  • chromadb (vector database)
  • transformers (Q&A models)
  • torch (PyTorch backend)
  • PyMuPDF (PDF processing)
  • pytesseract (OCR)
  • Additional supporting libraries

πŸ“ˆ Comparison with Cloud Solutions

Feature LocalRAG Cloud RAG
Privacy βœ… 100% Local ❌ Data sent to cloud
Cost βœ… Free after setup πŸ’° Per-query costs
Internet βœ… Works offline ❌ Requires internet
Setup ⚑ Quick local setup πŸ”§ API keys needed
Speed ⚑ Fast local inference 🐌 Network latency
Customization βœ… Full control ❌ Limited options
Data Security βœ… Your machine only ❌ Third-party servers

πŸš€ Advanced Usage Examples

Large Technical Document Processing

# Process a large technical manual
python doc-qa-local.py process 
  --document large_manual.pdf 
  --chunk-size 1500 
  --chunk-overlap 300 
  --cpu

# Query with comprehensive analysis
python doc-qa-local.py query --cpu --comprehensive

Batch Processing Multiple Documents

# Process multiple documents to the same database
python doc-qa-local.py process --document doc1.pdf --db-path shared_db --cpu
python doc-qa-local.py process --document doc2.pdf --db-path shared_db --cpu
python doc-qa-local.py query --db-path shared_db --cpu

Custom Embedding Models

# Use different embedding models for better semantic understanding
python doc-qa-local.py process 
  --document document.pdf 
  --embedding-model "sentence-transformers/all-mpnet-base-v2" 
  --cpu

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and test thoroughly
  4. Ensure all dependencies are in local-requirements.txt
  5. Update documentation if needed
  6. Submit a pull request with detailed description

πŸ“„ License

MIT License - feel free to use in your projects!

πŸŽ‰ What's New in This Version

  • βœ… Fully Tested Setup: Complete installation and usage validation
  • βœ… Enhanced Technical Processing: Specialized handling for technical documents
  • βœ… Multiple Answer Modes: Standard, comprehensive, and simple modes
  • βœ… Improved Documentation: Comprehensive setup and troubleshooting guides
  • βœ… macOS Compatibility: Verified working on Apple Silicon with proper CPU flags
  • βœ… Robust Error Handling: Better handling of GPU/CUDA issues and memory constraints

Ready to get started? The system is fully tested and ready to use! Simply follow the installation steps above and start processing your documents.

Need help? Check the troubleshooting section above or open an issue for support.

βœ… System Successfully Tested & Ready to Use!

The LocalRAG system has been fully tested and validated with:

  • βœ… Complete Environment Setup: Virtual environment with all dependencies installed
  • βœ… Document Processing Verified: Successfully processed test documents with enhanced chunking
  • βœ… Q&A System Operational: Interactive query system running with transformer models loaded
  • βœ… Technical Document Support: Enhanced processing for automotive/technical manuals
  • βœ… Multiple Answer Modes: Tested standard, comprehensive, and transformer-based responses

πŸš€ Key Features

Core Capabilities

  • πŸ”’ 100% Local Processing: No external APIs, no data leaves your machine
  • πŸ“„ Advanced OCR Support: Handles rotated and scanned PDFs with automatic text extraction
  • 🧠 Multiple AI Models: Choose from transformer models, Ollama, comprehensive analysis, or simple extraction
  • πŸ“Š Enhanced Smart Chunking: Technical document-aware splitting with section preservation
  • ⚑ Fast Vector Search: ChromaDB with enhanced relevance scoring for technical content
  • 🎯 Interactive Q&A: Terminal-based question-answering interface with multiple response modes
  • πŸ“š Source Attribution: Shows which document sections support each answer
  • πŸ”§ Confidence Scoring: Get confidence levels and model information for answers

Enhanced Features

  • πŸ”¬ Technical Document Processing: Specialized handling for technical manuals, specifications, and engineering documents
  • πŸ“‹ Comprehensive Answer Generation: Detailed, structured responses with signal lists and technical analysis
  • 🎚 Multiple Answer Modes: Standard, comprehensive, and simple modes for different use cases
  • πŸ” Context-Aware Search: Enhanced vector search that understands technical document structure
  • βš™ Command-Line Flexibility: Extensive options for customizing processing and answer generation

πŸ›  Installation & Setup - Verified Working!

βœ… Quick Setup (Tested on macOS)

  1. Clone and navigate to the repository

    git clone https://github.com/shreyasren/LocalRAG.git
    cd LocalRAG
  2. Run the automated setup script:

    chmod +x setup.sh
    ./setup.sh

βœ… Manual Setup (Recommended)

  1. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On macOS/Linux
    # or
    venv\Scripts\activate     # On Windows
  2. Install dependencies:

    pip install --upgrade pip
    pip install -r local-requirements.txt
  3. Install system dependencies (macOS):

    brew install tesseract poppler

    Ubuntu/Debian:

    sudo apt-get install tesseract-ocr poppler-utils

    Windows:

⚠️ Important Notes from Testing

  • Use --cpu flag: Required on macOS to avoid CUDA errors
  • PyMuPDF Installation: Pre-compiled wheels work best; avoid compilation from source
  • Memory Requirements: 4-8GB RAM recommended for processing large documents

🎯 Usage - Tested & Working!

Step 1: Process Your Document βœ…

First, vectorize your document (one-time setup per document):

Basic Processing (Standard Chunking):

python doc-qa-local.py process --document your_document.pdf --cpu --no-enhanced

Enhanced Processing for Technical Documents (Recommended):

python doc-qa-local.py process --document technical_manual.pdf --cpu

Processing Options:

  • --cpu: Required on macOS to avoid GPU/CUDA issues
  • --chunk-size 1000: Text chunk size in characters (default: 1000)
  • --chunk-overlap 200: Overlap between chunks (default: 200)
  • --db-path ./local_chroma_db: Database location (default: ./local_chroma_db)
  • --embedding-model all-MiniLM-L6-v2: Embedding model (default: all-MiniLM-L6-v2)
  • --no-ocr: Disable OCR for faster processing (skip if document is rotated)
  • --no-enhanced: Disable enhanced technical document processing (use basic chunking)

πŸ“‹ Key Differences:

  • Enhanced Processing (Default): Uses smart chunking optimized for technical documents, better signal/component recognition
  • Basic Processing (--no-enhanced): Uses standard text chunking, faster but less optimized for technical content
  • Enhanced processing is recommended for technical manuals, specifications, and complex documents

βœ… Example (Successfully Tested):

python doc-qa-local.py process --document test_document.txt --cpu

# Output shows:
# βœ… Document processed successfully!
# You can now run queries with: python doc-qa-local.py query --cpu

Step 2: Query Your Document βœ…

After processing, start the interactive Q&A session:

Standard Mode (Transformer Models - Tested Working):

python doc-qa-local.py query --cpu

Comprehensive Mode (Detailed Technical Responses):

python doc-qa-local.py query --cpu --comprehensive

Basic Mode (Simple Extraction):

python doc-qa-local.py query --llm simple --cpu

βœ… Example Query Session:

============================================================
Local Document Q&A System
============================================================
Type your questions below. Type 'exit' or 'quit' to end.

πŸ“ Your Question: What is machine learning?

πŸ“š ANSWER:
Machine learning (ML) is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence (AI) based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

[High confidence answer from RoBERTa]

πŸ€– Answer Generation Modes

1. Standard Mode (Default)

python doc-qa-local.py query --cpu
  • Method: Transformer models (DistilBERT, RoBERTa) with auto-enhancement
  • Quality: High-quality extractive answers with automatic supplementation for short responses
  • Speed: Fast inference on CPU/GPU
  • Best For: General questions and quick technical lookups

2. Comprehensive Mode (NEW!)

python doc-qa-local.py query --cpu --comprehensive
  • Method: Advanced multi-chunk analysis with structured response generation
  • Quality: Detailed, structured responses with signal lists, technical components, and section-based analysis
  • Features:
    • Extracts specific signal names and technical terms
    • Groups information by document sections
    • Creates bullet-point lists for complex queries
    • Provides comprehensive technical analysis
  • Best For: Technical documents, signal analysis, complex engineering questions

3. Ollama Integration

# First install Ollama
brew install ollama
ollama pull llama3.2:3b  # or llama3.2:7b for better quality

# Then use
python doc-qa-local.py query --llm ollama --cpu
  • Quality: Excellent, human-like responses
  • Models: Llama 3.2, Mistral, Phi-3, and more
  • Requires: Ollama installation and model download

4. Simple Extraction

python doc-qa-local.py query --llm simple --cpu
  • Speed: Fastest option
  • Quality: Basic keyword matching
  • Use Case: Quick searches or resource-constrained systems

πŸ”§ Technical Architecture & Enhancement Features

Enhanced Technical Document Processing

The LocalRAG system includes specialized enhancements for technical documents like manuals, specifications, and engineering documentation:

Smart Chunking & Pattern Recognition

  • Section-Aware Splitting: Preserves technical sections and subsections
  • Header Recognition: Identifies and maintains document structure
  • Technical Pattern Detection: Recognizes signal names, commands, and technical terminology
  • Context Preservation: Keeps section titles with content for better understanding

Enhanced Vector Search

  • Context Expansion: Groups related technical information from the same section
  • Technical Relevance Scoring: Prioritizes chunks with technical indicators over headers/metadata
  • Multi-Chunk Analysis: Combines information from multiple relevant document sections
  • Header Penalty: Reduces scoring for header-heavy content to focus on technical details

Multiple Answer Generation Modes

1. Standard Mode (Default - Tested Working)

  • Method: Transformer models (DistilBERT, RoBERTa) with auto-enhancement
  • Quality: High-quality extractive answers with automatic supplementation for short responses
  • Speed: Fast inference on CPU/GPU
  • Best For: General questions and quick technical lookups

2. Comprehensive Mode (Technical Documents)

  • Method: Advanced multi-chunk analysis with structured response generation
  • Features:
    • Extracts specific signal names and technical terms
    • Groups information by document sections
    • Creates bullet-point lists for complex queries
    • Provides comprehensive technical analysis
  • Best For: Technical documents, signal analysis, complex engineering questions

3. Ollama Integration (Optional)

  • Quality: Excellent, human-like responses
  • Models: Llama 3.2, Mistral, Phi-3, and more
  • Requires: Ollama installation and model download

4. Simple Extraction (Fastest)

  • Speed: Fastest option
  • Quality: Basic keyword matching
  • Use Case: Quick searches or resource-constrained systems

System Architecture

Document Input (PDF/TXT/DOCX)
    ↓
OCR Processing (if needed)
    ↓
Enhanced/Basic Chunking
    ↓
Embedding Generation (Sentence Transformers)
    ↓
Vector Database Storage (ChromaDB)
    ↓
Query Processing Pipeline:
User Question β†’ Question Embedding β†’ Enhanced Vector Search β†’ Context Creation β†’ Answer Generation β†’ Response with Sources

πŸ“Š Example Workflows

Basic Document Q&A

# 1. Process a document with OCR support
python doc-qa-local.py process --document bcm.pdf --cpu

# 2. Start Q&A with standard transformer models
python doc-qa-local.py query --cpu

# Example interaction:
πŸ“ Your Question: What are the main components of BCM?

πŸ“š ANSWER:
Body Control Module (BCM) includes lighting control, power management, and vehicle communication systems.

[High confidence answer from RoBERTa]

Technical Document Analysis

# 1. Process with enhanced technical features
python doc-qa-local.py process --document technical_spec.pdf --cpu

# 2. Use comprehensive mode for detailed analysis
python doc-qa-local.py query --cpu --comprehensive

# Example interaction:
πŸ“ Your Question: list all control signals for the system

πŸ“š ANSWER:
**System Control Signals:**
β€’ [Detailed signal list with technical context]
β€’ [Component relationships and dependencies]  
β€’ [Functional descriptions and usage patterns]

βš™οΈ Advanced Configuration

Custom Embedding Models

For better semantic search, try different embedding models:

python doc-qa-local.py process \
  --document document.pdf \
  --embedding-model "sentence-transformers/all-mpnet-base-v2"

GPU Acceleration

If you have a CUDA-compatible GPU:

python doc-qa-local.py query  # Remove --cpu flag for GPU acceleration

Disable Enhanced Processing

For simpler documents or faster processing:

python doc-qa-local.py process --document simple_doc.pdf --no-enhanced --cpu
python doc-qa-local.py query --no-enhanced --cpu

Batch Processing Multiple Documents

# Process multiple documents to the same database
python doc-qa-local.py process --document doc1.pdf --db-path shared_db
python doc-qa-local.py process --document doc2.pdf --db-path shared_db
python doc-qa-local.py query --db-path shared_db --llm transformers

πŸ” Answer Quality & Confidence

The system provides detailed feedback on answer quality:

  • High Confidence (>0.6): [High confidence answer from RoBERTa]
  • Moderate Confidence (0.3-0.6): [Answer from DistilBERT with moderate confidence: 0.45]
  • Low Confidence (<0.3): [Note: Low confidence answer. Consider asking a more specific question.]

πŸ“‹ Repository Structure

πŸ“ Vectorization Pipeline/
β”œβ”€β”€ πŸ“„ README.md                 # This file - comprehensive documentation
β”œβ”€β”€ πŸ“„ doc-qa-local.py          # Main application - process & query documents
β”œβ”€β”€ πŸ“„ local-requirements.txt   # Python dependencies
β”œβ”€β”€ πŸ“„ setup.sh                 # Automated setup script (cross-platform)
β”œβ”€β”€ πŸ“„ local-setup-guide.md     # Detailed manual setup instructions
β”œβ”€β”€ πŸ“ local_chroma_db/         # Vector database (created after processing)
β”œβ”€β”€ οΏ½ .venv/                   # Python virtual environment
└── πŸ“„ bcm.pdf                  # Example document (Body Control Module spec)

πŸš€ Performance & Optimization

For Large Documents (1000+ pages)

  • Chunk Size: Use 1500-2000 characters for better context
  • Processing Time: 30-60 minutes for initial vectorization
  • Memory: Requires 4-8GB RAM for large documents
  • Storage: ~1-3GB database size for very large documents

Speed Optimization

# Disable OCR for faster processing (if document isn't rotated)
python doc-qa-local.py process --document doc.pdf --no-ocr

# Use simple extraction for fastest queries
python doc-qa-local.py query --llm simple

πŸ›‘οΈ Privacy & Security

  • No Internet Required: All processing happens locally - successfully tested offline
  • No Data Transmission: Documents never leave your machine
  • No API Keys: No external service dependencies
  • Secure Storage: Local database with no external access
  • Portable: Can run on air-gapped systems

πŸ› Troubleshooting - Common Issues & Solutions

GPU/CUDA Issues (macOS)

Problem: AssertionError: Torch not compiled with CUDA enabled Solution: Always use the --cpu flag on macOS

python doc-qa-local.py process --document file.pdf --cpu
python doc-qa-local.py query --cpu

PyMuPDF Installation Issues

Problem: Compilation errors with climits header Solution: Use pre-compiled wheels

pip install PyMuPDF --no-deps  # Uses pre-compiled wheel

Out of Memory Errors

Problem: System runs out of RAM during processing Solutions:

# Use smaller chunks
python doc-qa-local.py process --document doc.pdf --chunk-size 500 --cpu

# Force CPU usage
python doc-qa-local.py query --cpu

# Disable enhanced processing for large documents
python doc-qa-local.py process --document doc.pdf --no-enhanced --cpu

OCR Dependencies Missing

Problem: Tesseract not found Solutions:

# macOS
brew install tesseract poppler

# Ubuntu/Debian
sudo apt-get install tesseract-ocr poppler-utils

# Or disable OCR
python doc-qa-local.py process --document doc.pdf --no-ocr --cpu

Poor Answer Quality

Problem: Getting empty or low-confidence answers Solutions:

# Use comprehensive mode for technical documents
python doc-qa-local.py query --cpu --comprehensive

# Try different chunk settings
python doc-qa-local.py process --document doc.pdf --chunk-size 1500 --chunk-overlap 400 --cpu

# Ensure enhanced processing is enabled (default)
python doc-qa-local.py process --document doc.pdf --cpu  # (enhanced by default)

Interactive Query Issues

Problem: EOF errors or input issues Solution: Run query mode directly without piping input

# Don't use: echo "question" | python doc-qa-local.py query
# Instead use: python doc-qa-local.py query --cpu

πŸ“Š Performance Expectations (Tested)

Document Processing

  • Small documents (1-10 pages): 30 seconds - 2 minutes
  • Medium documents (100-500 pages): 5-15 minutes
  • Large documents (1000+ pages): 30-60 minutes
  • Memory usage: 2-8GB RAM depending on document size

Query Performance

  • Search time: 1-3 seconds for vector search
  • Answer generation: 2-10 seconds depending on mode
  • Memory: 2-4GB RAM during queries

Tested Configuration (macOS M-series)

  • Python: 3.13.5
  • Processing: CPU-only with --cpu flag
  • Models: DistilBERT + RoBERTa for Q&A, all-MiniLM-L6-v2 for embeddings
  • Status: βœ… Fully functional and tested

πŸ“ˆ Comparison with Cloud Solutions

Feature This System Cloud RAG
Privacy βœ… 100% Local ❌ Data sent to cloud
Cost βœ… Free after setup πŸ’° Per-query costs
Internet βœ… Works offline ❌ Requires internet
Setup ⚑ Quick local setup πŸ”§ API keys needed
Speed ⚑ Fast local inference 🐌 Network latency

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test with different document types
  5. Submit a pull request

πŸ“„ License

MIT License - feel free to use in your projects!


Need help? Check the local-setup-guide.md for detailed setup instructions or open an issue for support.

About

Local document Q&A system using RAG architecture. Features OCR for scanned PDFs, multiple AI models (DistilBERT, RoBERTa), and ChromaDB vector search. 100% offline processing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors